Skip to main content

Full text of "Open Source Intelligence Books (OSINT)"

See other formats


PP1 


</) 

(/) 

LU 

z 

> 

(D 


Hacking Web 
Intelligence 

Open Source Intelligence and Web Reconnaissance 

Concepts and Techniques 


Sudhanshu Chauhan 
Nutan Kumar Panda 





PR3 


Hacking Web 
Intelligence 

Open Source Intelligence and 
Web Reconnaissance Concepts 

and Techniques 


Sudhanshu Chauhan 
Nutan Kumar Panda 



ELSEVIER 


AMSTERDAM • BOSTON • HEIDELBERG • LONDON 
NEW YORK • OXFORD • PARIS • SAN DIEGO 
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO 


Syngress is an imprint of Elsevier 


Copyrighted material 



PR4 


Acquiring Editor: Chris Katsaropoulos 
Editorial Project Manager: Benjamin Rearick 
Project Manager: Punithavathy Govindaradjane 
Designer: Matthew Limbert 

Syngress is an imprint of Elsevier 

225 Wyman Street, Waltham. MA 02451, USA 

Copyright © 2015 Elsevier Inc. All rights reserved. 

No part of this publication may be reproduced or transmitted in any form or by any means, 
electronic or mechanical, including photocopying, recording, or any information storage 
and retrieval system, without permission in writing from the publisher. Details on how to 
seek permission, further information about the Publisher’s permissions policies and our 
arrangements with organizations such as the Copyright Clearance Center and the Copyright 
Licensing Agency, can be found at our website: www.elsevier.com/permissions. 

This book and the individual contributions contained in it are protected under copyright by 
the Publisher (other than as may be noted herein). 

Notices 

Knowledge and best practice in this field are constantly changing. As new research and 
experience broaden our understanding, changes in research methods, professional practices, 
or medical treatment may become necessary. 

Practitioners and researchers must always rely on their own experience and knowledge 
in evaluating and using any information, methods, compounds, or experiments described 
herein. In using such information or methods they should be mindful of their own safety and 
the safety of others, including parties for whom they have a professional responsibility. 

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, 
assume any liability for any injury and/or damage to persons or property as a matter of 
products liability, negligence or otherwise, or from any use or operation of any methods, 
products, instructions, or ideas contained in the material herein. 

ISBN: 978-0-12-801867-5 

British Library Cataloguing in Publication Data 

A catalogue record for this book is available from the British Library 

Library of Congress Cataloging-in-Publication Data 

A catalog record for this book is available from the Library of Congress 


For Information on all Syngress publications 
visit our website at store.elsevier.com/Syngress 


ELSEVIER 

Working together 
to grow libraries in 
SSStiSjS developing countries 


www.elsevier.com • www.bookaid.org 


Copyrighted material 








PR5 


Contents 


Preface.xiii 

About the Authors.xv 

Acknowledgments.xvii 

CHAPTER 1 Foundation: Understanding the Basics.1 

Introduction.1 

Internet.2 

Definition.2 

How it works.2 

World Wide Web.3 

Fundamental differences between internet and WWW.3 

Defining the basic terms.3 

IP address.3 

Port.4 

Protocol.4 

MAC address.5 

E-mail.5 

Domain name system.5 

URL.6 

Server.7 

Web search engine.7 

Web browser.7 

Virtualization.7 

Web browsing—behind the scene.8 

Lab environment.10 

Operating system.10 

Programming language.11 

Browser.12 

CHAPTER 2 Open Source Intelligence and Advanced Social 

Media Search.15 

Introduction.15 

Open source intelligence.16 

How we commonly access OSINT.16 

Search engines.16 

News sites.17 

Corporate websites.17 

Content sharing websites.17 

V 


Copyrighted material 






































Contents 


Academic sites.18 

Blogs.18 

Government sites.19 

Web 2.0.19 

Social media intelligence.20 

Social network.20 

Introduction to various social networks.21 

Advanced search techniques for some specific social media.25 

Faccbook.25 

Linkedln.27 

Twitter.30 

Searching any open social media website.31 

Web 3.0.31 

CHAPTER 3 Understanding Browsers and Beyond.33 

Introduction.33 

Browser operations.33 

History of browsers.34 

Browser architecture.34 

User interface.35 

Browser engine.35 

Rendering engine.35 

Networking.35 

UI backend.35 

JavaScript interpreter.35 

Data persistence.35 

Error tolerance.36 

Threads.36 

Browser features.36 

Private browsing.36 

Autocomplete.37 

Proxy setup.38 

Raw browsers.38 

Why custom versions?.39 

Some of the well-known custom browsers.40 

Epic.40 

HconSTF.40 

Mantra.41 

FireCAT.43 

Oryon C.44 


Copyrighted material 









































PR7 


Contents vii 


WhiteHat Aviator.44 

TOR bundle.45 

Custom browser category.45 

Pros and cons of each of these browsers.46 

Addons.46 

Shodan.48 

Wappalyzer.48 

Buildwith.48 

Follow.49 

Riffle.49 

Who Works.at.50 

Onetab.50 

SalcsLoft.51 

Project Naptha.51 

Tineye.51 

Reveye.51 

Contactmonkey.52 

Bookmark.52 

Threats posed by browsers.52 

CHAPTER 4 Search the Web—Beyond Convention.53 

Introduction.53 

Meta search.54 

People search.56 

Business/company search.59 

Reverse username/e-mail search.60 

Semantic search.62 

Social media search.63 

Twitter.65 

Source code search.66 

Technology information.67 

Reverse image search.69 

Miscellaneous.71 

CHAPTER 5 Advanced Web Searching.77 

Introduction.77 

Google.78 

Bing.85 

Yahoo.88 

Yandex.90 


Copyrighted material 








































viii Contents 


CHAPTER 6 OSINT Tools and Techniques.101 

Introduction.101 

Creepy.102 

TheHarvester.105 

Shodan.107 

Search Diggity.110 

Recon-ng.113 

Case 1 .119 

Case 2.119 

Case 3.120 

Case 4.120 

Yahoo Pipes.121 

Maltego.124 

Entity.124 

Transform.124 

Machine.125 

Investigate.126 

Manage.126 

Organize.126 

Machines.127 

Collaboration.127 

Domain to website IP addresses.128 

Domain to e-mail address.129 

Person to website.130 

CHAPTER 7 Metadata.133 

Introduction.133 

Metadata extraction tools.134 

Jeffrey’s Exif Viewer.134 

Exif Search.136 

ivMeta.137 

Hachoir-metadata.138 

FOCA.139 

Metagoofil.140 

Impact.142 

Search Diggity.143 

Metadata removal/DLP tools.145 

MetaShield Protector.145 

MAT.145 

MyDLP.145 


Copyrighted material 









































X 


Contents 


Flowcharts.192 

Maltego.193 

CaseFile.194 

MagicTree.196 

KeepNote.197 

Lumify.198 

Xmind.199 

CHAPTER 11 Online Security.203 

Introduction.203 

Malwares.205 

Virus.206 

Trojan.206 

Ransom ware.206 

Keylogger.206 

Phishing.207 

Online scams and frauds.207 

Hacking attempts.208 

Weak password.208 

Shoulder surfing.208 

Social engineering.209 

Antivirus.209 

Identify phishing/scams.210 

Update operating system and other applications.210 

Addons for security.211 

Web of trust (WOT).211 

HTTPS Everywhere.212 

NoScript.212 

Tools for security.212 

Password policy.213 

Precautions against social engineering.215 

Data encryption.215 

CHAPTER 12 Basics of Social Networks Analysis.217 

Introduction.217 

Nodes.217 

Edges.218 

Network.218 


Copyrighted material 






































Contents 


XI 


Gephi.218 

Overview.218 

Data laboratory.219 

Preview.220 

Node attributes.220 

Edge attributes.221 

Direction.221 

Type.221 

Weight.221 

Ranking.222 

Betweenness.222 

CHAPTER 13 Quick and Dirty Python.229 

Introduction.229 

Programming versus scripting.229 

Introduction to Python.230 

Installation.230 

Modes.230 

Hello world program.231 

Identifiers.232 

Data types.232 

Indentation.235 

Modules.238 

Functions.239 

Classes.239 

Working with files.240 

User input.242 

Common mistakes.243 

Maltego transforms.245 

Resource.251 

CHAPTER 14 Case Studies and Examples.253 

Introduction.253 

Case studies.253 

Case study 1: The BlackHat Mashup.253 

Case study 2: A demo that changed audience view.255 

Case study 3: An epic interview.257 

Maltego machines.263 


Copyrighted material 





































XII 


Contents 


CHAPTER 15 Related Topics of Interest.267 

Introduction.267 

Cryptography.267 

Basic types.267 

Data rccovcry/shrcdding.269 

Internet Relay Chat.271 

Bitcoin.272 

Index.275 


Copyrighted material 










Preface 


It was just another day at work, as usual we were supposed to configure some scans, 
validate some results, and perform some manual tests. We have been working with 
our team on some pentesting projects. Unlike many other jobs pentesting is not that 
boring, honestly who doesn’t enjoy finding flaws in someone’s work and get paid for 
it. So following the process we did some rccon and found some interesting informa¬ 
tion about the target. We started digging deeper and soon we had enough information 
to compromise the target. We finished the rest of the process and send out the reports 
to the clients, who were more than happy with the results. 

Later that evening we were discussing about the tests and realized that most of 
the information, which allowed us to get a foothold in the target was actually public 
information. The target has already revealed too much about itself and it was just a 
matter of connecting the dots. It ended here and we almost forgot about it. Another 
fine day we were working on some other project and the same thing happened again. 
So we decided to document all the tools and techniques we were aware of and create a 
shared document, which we both could contribute to. Anytime we encountered some 
new method to discover public information, we added it to the document. Soon we 
realized that the document has grown too long and we need to categorize and filter it. 

Though the topic has been known and utilized in pentesting and red team exer¬ 
cises widely yet when we tried to find documented work on it, we didn’t find any¬ 
thing substantial. This is where we started thinking of converting our document into 
a book. 

While researching about the topic we understood that there is too much public 
information, which is easily accessible. Most of it might not seem very useful at first 
glance but once collected and correlated, it can bring phenomenal results. We also 
realized that it is not just pentesting where it is of prime importance to collect infor¬ 
mation about the target, but there are many other professions, which utilize simi¬ 
lar methods, such as sales reps find information about prospective client, marketing 
professionals collect information related to market and competition. Keeping that 
in mind we have tried to keep the tone and flow of the book easy to follow, without 
compromising on the technical details. The book moves from defining the basics to 
learning more about the tools we are already familiar with and finally toward more 
technical stuff. 


WHAT THIS BOOK COVERS 

Hacking Web Intelligence has been divided into different sections according to the 
complexity and mutual dependency. The first few chapters are about the basics and 
dive deep into topics most of us are already familiar with. The middle section talks 



XIV 


Preface 


about the advanced tools and techniques and in the later portion we will talk about 
actually utilizing and implementing what we discuss in previous sections. 

While following the book it is suggested to not just read it but practice it. The 
examples and illustrations are included to understand how things work and what to 
expect as a result. It is not just about using a tool but also understanding how it does 
so as well as what to do with the information collected. Most of the tools will be able 
to collect information but to complete the picture we need to connect these dots. On 
the other hand like any tool, the ones we will be using might be updated, modified, 
or even depreciated and new once might show up with different functionality, so stay 
updated. 


HOW DO YOU PRACTICE 

A desktop/laptop with any operating system. Different browsers such as Mozilla 
Firefox, Chrome or Chromium, and internet connectivity. Readers will be assisted to 
download and install tools and dependencies based on the requirement of the chapter. 


TO WHOM THIS BOOK IS FOR 

The book would focus mainly on professionals related to information security/intel¬ 
ligence/risk management/consulting but unlike “from Hackers to the Hackers” books 
it would also be helpful and understandable to laymen who require information gath¬ 
ering as a part of their daily job such as marketing, sales, journalism, etc. 

The book can be used in any intermediate level information security course for 
reconnaissance phase of the security assessment. 

We hope that as a reader you learn something new which you could practice in 
your daily life to make it easier and more fruitful like we did while creating it. 

Sudhanshu Chatihan 

Principal Consultant, Noida, India 

Nutan Kumar Panda 

Information Security Engineer, Bangalore, India 


Copyrighted material 


PR15 


About the Authors 


SUDHANSHU CHAUHAN 

Sudhanshu Chauhan is an information security professional and OSINT specialist. 
He has worked in the information security industry, previously as senior security 
analyst at iViZ and currently as director and principal consultant at Octogence Tech 
Solutions, a penetration testing consultancy. He previously worked at the National 
Informatics Center in New Delhi developing web applications to prevent threats. He 
has a BTech (CSE) from Amity School of Engineering and Diploma in cyber secu¬ 
rity. He has been listed in various Hall of Fame such as Adobe, Barracuda, Yandex, 
and Freelancer. Sudhanshu has also written various articles on a wide range of topics 
including Cyber Threats, Vulnerability Assessment, Honeypots, and Metadata. 


NUTAN KUMAR PANDA 

An information security professional with expertise in the field of application and 
network security. He has completed his BTech (IT) and has also earned various pres¬ 
tigious certifications in his domain such as CEH, CCNA, etc. Apart from performing 
security assessments he has also been involved in conducting/imparting informa¬ 
tion security training. He has been listed in various prestigious Hall of Fame such 
as Google, Microsoft, Yandex, etc. and has also written various articles/technical 
papers. Currently he is working as Information Security Engineer at eBay Inc. 


XV 


Copyrighted material 



PR16 


This page intentionally left blank 


Copyrighted material 



PR17 


Acknowledgments 


SUDHANSHU CHAUHAN 

I would like to dedicate this book to my family, my friends, and the whole security 
community, which is so open in sharing knowledge. Few people 1 would like to 
name who have encouraged and motivated me through this journey are Shubham, 
Chandan, Sourav da, and especially my mother Kanchan. 


NUTAN KUMAR PANDA 

I would like to dedicate this book to my parents and my lovely sister for believing in 
me and encouraging me. My friend, well-wisher, and my coauthor Sudhanshu for all 
the help and support during this book writing process, and last but not the least all my 
friends, colleagues, and specially Somnath da and the members of Null: The Open 
Security Community for always being there and giving their valuable suggestions in 
this process. Love you all. 


xvii 


Copyrighted material 



PR18 


This page intentionally left blank 


Copyrighted material 



PA1 



Foundation: 
Understanding the 
Basics 


1 


INFORMATION IN THIS CHAPTER 

• Information overload 

• What is internet 

• How it works 

• What is World Wide Web 

• Basic underlying technologies 

• Environment 


INTRODUCTION 


Information Age. The period of human evolution in which we all are growing up. 

Today internet is an integral part of our life. Wc all have started living dual life; one 
is our physical life and the other is the online one, where we exist as a virtual entity. 

In this virtual life we have different usernames, aliases, profile pictures, and what not 
in different places. We share our information intentionally and sometimes uninten¬ 
tionally in this virtual world of ours. If we ask ourselves how many websites we’re 
registered on, most probably we won’t be able to answer that question with an exact 
number. The definition of being social is changing from meeting people in person to 
doing Google hangout and being online on different social networking sites. In the 
current situation it seems that technology is evolving so fast that we need to cope up 
with its pace. 

The evolution of computation power is very rapid. From an era of limited 
amount of data we have reached to the times where there is information overload. 

Today technologies like Big data, Cloud computing are the buzzwords of the IT 
industry, both of which deal with handling huge amount of data. This evolution 
certainly has its pros as well as cons, from data extraction point of view we need to 
understand both and evaluate how we can utilize them to our advantage ethically. 

The main obstacle in this path is not the deficiency of information but surprisingly 
the abundance of it present at the touch of our fingertips. At this stage what we 
require is relevant and efficient ways to extract actionable intelligence from this 
enormous data ocean. 

Hacking Web Intelligence. http://dx.doi.Org/10.1016/B978-0.12-801867.5.ft0001-X \ 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






2 


CHAPTER 1 Foundation: Understanding the Basics 


Extracting the data which could lead toward a fruitful result is like looking for a 
needle in a haystack. Though sometimes the information which could play a game 
changing role is present openly and free to access, yet if we don’t know how to find it 
in a timely fashion or worse that it even exists it would waste a huge amount of critical 
resources. During the course of this book we will be dealing with practical tools and 
techniques which would not only help us to find information in a timely manner but 
also help us to analyze such information for better decision making. This could make a 
huge difference for the people dealing with such information as a part of their daily job, 
such as pentesters, due diligence analysts, competitive intelligence professionals, etc. 

Let’s straightaway jump in and understand the internet we all have been using 
for so long. 


INTERNET 

Internet, as we know it has evolved from a project funded by DARPA within the 
US Department of Defense. The initial network was used to connect universities 
and research labs within the US. This phenomenon slowly developed worldwide and 
today it has taken the shape of the giant network which allows us to connect with the 
whole world within seconds. 

DEFINITION 

Simply said the internet is a global network of the interlinked computers using dedi¬ 
cated routers and servers, which allows its end users to access the data scattered all 
over the world. These interconnected computers follow a particular set of rules to 
communicate, in this case IP or internet protocol (IP) for transmitting data. 


HOW IT WORKS 

If you bought this book and are reading it then you must be already knowing how 
internet works, but still it’s our duty to brush up some basics, not deeply though. 
As stated above, internet is a global network of interconnected computers and lots 
of devices collaboratively make the internet work, for example, routers, servers, 
switches with other hardware like cables, antennas, etc. All these devices together 
create the network of networks, over which all the data transmission takes place. 

As in any communications you must have end points, medium, and rules. Inter¬ 
net also works around with these concepts. End points are like PC, laptop, tablet, 
smartphone, or any other device a user uses. Medium or nodes are the different dedi¬ 
cated servers and routers connected to each other and protocols are sets of rules that 
machines follow to complete tasks such as transmission control protocol (TCP)/1P. 
Some of the modes of transmission of data are telephone cables, optical fiber, radio 
waves, etc. 


Copyrighted material 


PA3 


Defining the basic terms 3 


WORLD WIDE WEB 

World Wide Web (WWW) or simply known as the web is a subset of internet or in 
simple words it’s just a part of the internet. The WWW consists of all the public web¬ 
sites connected to the internet, including the client devices that access them. 

It is basically a structure which consists of interlinked documents and is rep¬ 
resented in the form of web pages. These web pages can contain different media 
types such as plain text, images, videos, etc. and are accessed using a client appli¬ 
cation, usually a web browser. It consists of a huge number of such interconnected 
pages. 


FUNDAMENTAL DIFFERENCES BETWEEN INTERNET AND WWW 

For most of us the web is synonymous to internet, though it contributes to the internet 
yet it is still a part of it. Internet is the parent class of the WWW. In the web, the infor¬ 
mation and documents are linked by the website uniform resource locators (URLs) 
and the hyperlinks. They are accessed by browser of any end device such as PC or 
smartphone using hypertext transfer protocol (http) and nowadays generally using 
https. HTTP is one of the different protocols that are being used in internet such as 
file transfer protocol (FTP), simple mail transfer protocol (SMTP), etc. which will 
be discussed later. 

So now as we understand the basics of internet and web, we can move ahead 
and learn about some of the basic tcrminologics/tcchnologics which we will be fre¬ 
quently using during the course of this book. 


DEFINING THE BASIC TERMS 

IP ADDRESS 

Anyone who has ever used a computer must have heard about the term IP address. 
Though some of us might not understand the technical details behind, yet we all know 
it is something associated with the computer address. In simple words, IP address is 
the virtual address of a computer or a network device that uniquely identifies that 
device in a network. If our device is connected to a network we can easily find out 
the device IP address. In case of Windows user it can simply be done by opening the 
command prompt and typing a command “ipconlig”. It’s near about the same for a 
Linux and Mac user. We have to open the terminal and type “ifconfig” to find out the 
IP address associated with the system. 

IP address is also known as the logical address and is not permanent. The IP 
address scheme popularly used is IPv4, though the newer version IPv6 is soon catch¬ 
ing up. It is represented in dotted decimal number. For example, “192.168.0.1”. 
It starts from 0.0.0.0 to 255.255.255.255. When we try to find out the IP address 


Copyrighted material 


4 


CHAPTER 1 Foundation: Understanding the Basics 


associated with our system using any of the methods mentioned above, then we will 
find that the address will be within the range mentioned above. 

Broadly IP address is of two types 

1. Private IP address 

2. Public IP address 

Private IP address is something that is used to uniquely identify a device in a local 
area network, let’s say how our system is unique in our office from other systems. 
There are sets of IP address that is only used for private IP addressing: 

10.0.0.0-10.255.255.255 

172.16.0.0-172.31.255.255 

192.168.0.0-192.168.255.255 

The above mentioned procedure can be used to check our private IP address. 
Public IP address is an address which uniquely identifies a system in internet. 
It’s generally provided by the Internet Service Provider (ISP). We can only check 
this when our system is connected to the internet. The address can be anything other 
than private IP address range. We can check it in our system (despite of any OS) by 
browsing “whatismyipaddress.com”. 


PORT 

We all arc aware of ports like USB port, audio port, etc. but here we are not talk¬ 
ing about hardware ports, what we are talking about is a logical port. In simple 
words, ports can be defined as a communication point. Earlier we discussed how 
IP address uniquely identifies a system in a network and when a port number is 
added to the IP address then it completes the destination address to communicate 
with the destination IP address system using the protocol associated with the pro¬ 
vided port number. We will soon discuss about protocol, but for the time being 
let’s assume protocol is a set of rules followed by all communicating parties for 
the data exchange. Let’s assume a website is running on a system with IP address 
“192.168.0.2” and we want to communicate with that server from another system 
connected in the same network with IP address “192.168.0.3”. So we just have to 
open the browser and type “192.168.0.2:80” where “80” is the port number used 
for communication which is generally associated with http protocol. Ports are gen¬ 
erally application specific or process specific. Port numbers are within the range 
0-65535. 

PROTOCOL 

Protocol is a standard set of regulations and requirements used in a communication 
between source and destination system. It specifies how to connect and exchange 
data with one another. Simply stated, it is a set of rules being followed for communi¬ 
cation between two entities over a medium. 


Copyrighted material 


PA5 


Defining the basic terms 5 


Some popular protocols and their associated port numbers: 

• 20, 21 FTP (File Transfer Protocol): Used for file transfer 

• 22 SSH (Secure Shell): Used for secure data communication with another machine. 

• 23 Telnet (Telecommunication network): Used for data communication with another machine. 

• 25 SMTP (Simple Mail Transfer Protocol): Used for the management of e-mails. 

• 80 HTTP (Ilyper Text Transfer Protocol): Used to transfer hypertext data (web). 


MAC ADDRESS 

MAC address is also known as physical address. MAC address or media access con¬ 
trol address is a unique value assigned to the network interlace by the manufacturer. 
Network interface is the interface used to connect the network cable. It’s represented 
by hexadecimal number. For example, “00:A2:BA:C1:2B: 1C”. Where the first three 
sets of hexadecimal character is the manufacturer number and rest is the serial num¬ 
ber. Now let’s find MAC address of our system. 

In case of Windows user it can simply be done by opening the command prompt 
and typing a command either “ipconfig/all” or “getmac”. It’s near about the same for 
a Linux and Mac user. We have to open the terminal and type “ifeonfig-a” to find out 
the MAC address associated with the system. Now let’s note down the MAC address/ 
physical address of our network interface of our system and find out the manufac¬ 
turer name. Search for the first three sets of hexadecimal character in Google to get 
the manufacturer name. 

E-MAIL 

E-mail is the abbreviation of electronic mail, one of the widely used technol¬ 
ogy for digital communication. It’s just one click solution for exchanging digital 
message from sender to receiver. A general structure of email address is “user- 
name@domainname.com”. The first part which comes prior to @ symbol is the 
username of any user who registered himself/herself for using that e-mail service. 
The second part post @ symbol is the domain name of the mail service provider. 
Apart from all these, nowadays every organization which have website registered 
with a domain name also creates mail service to use. So if we work in a company 
with domain name “xyz.com” our company e-mail id must be “ourusername® 
xyz.com”. Some popular e-mail providers are Google, Yahoo, Rediff, AOL, and 
Outlook, etc. 

DOMAIN NAME SYSTEM 

Domain name system (DNS) as the name suggests is a naming system for the 
resources connected to the internet. It maintains a hierarchical structure of this nam¬ 
ing scheme through a channel of various DNS servers scattered over the internet. 

For example, let’s take googlc.com it’s a domain name of Google Inc. Google 
has its servers present in different locations and different servers are uniquely 
assigned with different IP addresses. It is different for a person to remember all 


Copyrighted material 




6 


CHAPTER 1 Foundation: Understanding the Basics 


the IP address of different servers he/she wants to connect, so there comes DNS 
allowing a user to remember just the name instead of all those IP address. In this 
example we can easily divide the domain name into two parts. First part is the name 
generally associated with the organization name or purpose for which domain is 
bought as here Google is the organization name in google.com. The second part 
or the suffix part explains about the type of the domain such as here “com” is used 
for commercial or business purpose domain. These suffixes are also known as top 
level domains (TLDs). 


SOME EXAMPLES OF TLDS: 

• net: network organization 

• org: non-profit organization 

• edu: educational institutions 

• gov: government agencies 

• mil: military purpose 

One of the other popular suffix class is country code top level domain (ccTLD). Some examples are: 

• in: India 

• us: United States 

• uk: United Kingdom 


DNS is an integral part of the internet as it acts as yellow pages for it. We simply 
need to remember the resource name and the DNS will resolve it into a virtual address 
which can be easily accessed on the internet. For example, google.com resolves to 
the IP address 74.125.236.137 for a specific region on the internet. 

URL 

A URL or uniform resource locator can simply be understood as an address used to 
access the web resources. It is basically a web address. 

For example, http://www.example.com/test.jpg. This can be divided into five 
parts, which are: 

1. http 

2. www 

3. example 

4. com 

5. Aest.jpg 

The first part specifies the protocol used for communication, and in this case it is 
HTTP. But for some other case other protocols can also be used such as https or ftp. 
The second part is used to specify whether the URL used is for the main domain or a 
subdomain, www is generally used for main domain, some popular subdomains are 
blog, mail, career, etc. The third part and forth part are associated with the domain 


Copyrighted material 






PA7 


Defining the basic terms 7 


name and type of domain name which we just came across in DNS part. The last part 
specifies a file “test.jpg” which need to be accessed. 

SERVER 

A server is a computer program which provides a specific type of service to other 
programs. These other programs, known as clients can be running on the same sys¬ 
tem or in the same network. There are various kinds of servers and have different 
hardware requirements depending upon the factors like number of clients, band¬ 
width, etc. Some of the kinds of server are: 

Web server: Used for serving websites. 

E-mail server: Used for hosting and managing e-mails 
File server: Used to host and manage file distribution 


WEB SEARCH ENGINE 

A web search engine is a software application which crawls the web to index it and 
provides the information based on the user search query. Some search engines go 
beyond that and also extract information from various open databases. Usually the 
search engines provide real-time results based upon the backend crawling and data 
analysis algorithm they use. The results of a search engine are usually represented in 
the form of URLs with an abstract. 

Apart from usual web search engines, some search engines also index data from 
various forums, and other closed portals (require login). Some search engines also collect 
search results from various different search engines and provide it in a single interface. 

WEB BROWSER 

A web browser is a client-side application which provided the end user the capability 
to interact with the web. A browser contains an address bar, where the user needs to 
enter the web address (URL), this request is further sent to the destination server and 
the contents are displayed within the browser interface. The response for the request 
sent by client contains of raw data with associated format for the data. 

Earlier browsers had limited functionality, but nowadays with various features 
such as downloading content, bookmarking resources, saving credentials, etc. and 
new add-ons coming up every day, browsers are becoming very powerful. The advent 
of cloud-based applications has also hugely contributed in making browsers the most 
widely used software. 


VIRTUALIZATION 

Virtualization can be described as the technique of abstracting physical resources, 
with the aim of simplification and utilization of the resources with ease. It can consist 


Copyrighted material 


8 


CHAPTER 1 Foundation: Understanding the Basics 


of anything from a hardware platform to a storage device or OS, etc. Some of the 
classifications of virtualization are: 

Hardware/platform: Creation of a virtual machine that performs like an original 
computer with an OS. The machine on which the virtualization takes place is 
the host machine and the virtual machine is the guest machine. 

Desktop: Concept of separating the logical desktop from the physical 
machine. The user interacts with the host machine over a network using 
another device. 

Software: OS level virtualization can be described as hosting of multiple virtu¬ 
alization environments within a single OS instance. Application virtualization is 
hosting of individual applications in an environment separated from the underly¬ 
ing OS. In service virtualization the behavior of dependent system component is 
emulated. 

Network: Creation of a virtualized network addressing space within or across 
network subnets. 


WEB BROWSING—BEHIND THE SCENE 

So now as we have put some light on some of the technological keywords that we will 
be dealing with in later chapters, let’s dive a little deeper and try to understand what 
exactly happens when we try to browse a website. When we enter a URL in a browser 
it divides the same into two parts. Let’s say we entered “http://www.example.com”. 
The two parts of this URL will be (1) http and (2) example.com. The reason fordoing 
so is that to identify the protocol used and domain name to resolve it to an IP address. 
Let’s again assume that the IP address associated with the domain name example, 
com is “192.168.1.4” then browser will process it as “192.168.1.4:80” as 80 is the 
port number associated with protocol HTTP. 

From paragraph which contains details about DNS we already came across that 
it is used to resolve the domain name into IP address but how? It depends whether 
we are visiting a site for first time or we often visit this site. But still for both the 
case the procedure remains quite same. First DNS lookup starts with browser cache 
to check if there is some records present or not or checks whether we visited this 
site earlier or this is the first time. If the browser cache does not contain any infor¬ 
mation the browser does a system call to check whether OS is having any DNS 
record in its cache or not. Similarly if not found then it searches the same DNS 
info in router cache if not found the ISP DNS cache then finally if not found any 
DNS record in these places starts a recursive search from root name server to top 
level name servers to resolve the domain name. The thing which we need to think 
about is that some domain names are associated with multiple IP addresses such 
as google.com in that case also it returns with only one IP address based on the 
geographic location of the user who intent to use that resource. The technique is 
also known as geographic DNS. 


Copyrighted material 





Web browsing—behind the scene 


9 


In above paragraph we understood how DNS lookup searches for information 
from browser cache but that is only for sites which are static, because dynamic sites 
contains dynamic contents that expires quickly. However, the process is quite same 
for both the cases. 

After DNS resolution, browser opens a TCP connection to the server and sends a 
hypertext request based on the protocol mentioned in the URL as it is HTTP in our 
case browser will send an HTTP GET request to the server through TCP connection. 
Then browser will receive an HTTP response from the server with status code. In 
simple words, status codes define the server status for the request. There are different 
types of status codes, but that is a huge topic on its own; hence just for our under¬ 
standing I will include some of the popular status codes that a user might encounter 
in browsing 


User inputs a URL 

I http://www.example.com/ 


i 


DNS Resolution 
process 


Port: 80 H IP; 192.168.1.4 


XT 7 




• Receive HTTP response from server 


FIGURE 1.1 

Web browsing—behind the scene. 


Copyrighted material 






























10 CHAPTER 1 Foundation: Understanding the Basics 


HTTP STATUS CODE CLASSES 

They lie between 100 and 505 and arc categorized as different classes according to its first number. 

• lxx: Informational 

• 2xx: Successful 

• 3xx: Redirection 

• 4xx: Client-error 

• 5xx: Server-error 
Some popular status codes: 

• 100: continue 

• 200:ok 

• 301: moved permanently 

• 302: found 

• 400: bad request 

• 401: unauthorized 

• 403: forbidden 

• 404: not found 

• 500: internal server error 

• 502: bad gateway 


If browser gets any error status code then it fails to get the resources properly if 
not then it renders the response body. The response body generally contains html 
codes for the page contents and links to other resources, which further undergo the 
same process. If the response page is cacheable it will be stored in cache. This is the 
overall process takes place in the background when we try to browse something in 
internet using a browser. 


LAB ENVIRONMENT 

As we have discussed the basic concepts, now let’s move ahead and understand the 
environment for our future engagements. 

OPERATING SYSTEM 

For a computer system to run we need basic hardware such as motherboard, RAM, 
hard disc, etc. but hardware is worthless until there is an OS to run over it. An oper¬ 
ating system basically is a collection of software which can manage the underlying 
hardware and provide basic services to the users. 

Windows 

One of the most widely used OS introduced by Microsoft in 1985. After so many 
years it has evolved to a very mature stage. The current version is Windows 8.1. 
Though it has had its fair share of criticism yet it holds a major percentage of the 
market share. The ease of usability is one of the major features of this OS which 
makes it widely acceptable. 


Copyrighted material 




PA11 


Lab environment 11 


Though during the writing of this book we were using Windows 7 64 bit, any 
version above 7 will also be fine and would function more or less in similar fashion. 

Linux 

Popular as the OS of geeks, this OS is available in many flavors. Mostly it is used for 
servers due to the stability and security it provides, but is also popular among develop¬ 
ers, system admins, security professionals, etc. Though it surely seems a bit different 
as well as difficult to use for an average user, yet today it has evolved to a level where 
the graphical user interface (GUI) provided by some of its flavors are at par with 
Windows and Mac interfaces. The power of this OS lies in its terminal (command line 
interface), which allows to utilize all the functionality provided by the system. 

We will be using Kali Linux (http://www.kali.org/), a penetration testing distribu¬ 
tion during this book. It is based on Debian, which is a well-known, stable flavor of 
Linux. Though, other flavors such as Ubuntu, Arch Linux, etc. can also be used as 
most of the commands will be similar. 

Mac 

Developed by Apple this series of OS is well known for its distinctively sleek design. 
In the past it has faced criticism due to the limited options available at software front, 
but as of today there is a wide range of options available. It is said to be more secure 
as compared to its counterparts (in average use domain), yet it has faced some severe 
security issues. 

Mac provides a powerful command line interface (GUI) as well as CLI which 
makes it a good choice for any computing operation. Though we were using Mac OS 
X 10.8.2 during the writing of this book, any later version will also be fine for practice. 

Most of the tools which will be used during the course of this book will be free/ 
open source and also platform independent, though there will be some exceptions 
which will be pointed out as and when they come into play. It is recommended to have 
a virtual machine of a different OS type (discussed above) apart from the base system. 

To create a virtual machine we can use the virtualization software such as 
VirtualBox or VMware Player. Oracle VirutalBox can be downloaded from https:// 
www.virtualbox.org/wiki/Downloads. VMware Player can be downloaded from 
http://www.vmware.com/go/downloadplayer/. 


PROGRAMMING LANGUAGE 

A programming language is basically a set of instructions which allows to commu¬ 
nicate commands to a computing machine. Using a programming language we can 
control the behavior of a machine and automate processes. 

Java 

Java is a high-level, object-oriented programming language developed by Sun Micro¬ 
systems, now Oracle. Due to the stability provided by it, it is heavily used to develop 


Copyrighted material 




12 CHAPTER 1 Foundation: Understanding the Basics 


applications following clicnt-servcr architecture. It is one of the most popular pro¬ 
gramming language as of today. 

Java is required to run many browser-based as well as other applications and runs 
on a variety of platforms such as Windows, Linux, and Mac. 

The latest version of Java can be downloaded from: https://www.java.com/ 
en/download/manual.jsp 

Python 

A high-level programming language, which is often used for creating small and effi¬ 
cient scripts. It is also used widely for web development. Python follows the philoso¬ 
phy of code readability, which means indentation is an integral part of it. 

The huge amount of community support and availability of third party libraries 
makes it the preferable language of choice for most of the people who frequently 
need to automate small tasks. Though this does not mean that Python is not powerful 
enough to create full-fledged applications and Django, a Python-based web frame¬ 
work is a concrete example of that. We will discuss Python programming in detail 
in later chapter. 

The current version of Python is 3.4.0, though we will be using the version 2.7 as 
3.x series has had some major changes and it is not backward compatible. Most of the 
scripts we will be using/writing will be using the 2.7 version. It can be downloaded 
from https://www.python.Org/download/releases/2.7.6/ 

BROWSER 

As discussed above, a browser is a software application which is installed at the 
client’s end and allows to interact with the web. 

Chrome 

Developed by Google and it is one of the most widely used browser. First released 
in 2008, today this browser has evolved to a very stable release and has left the com¬ 
petition way behind. Most of its base code is available online in form of Chromium 

(http://www.chromium.org/Homc). 

Today Chrome is available for almost all devices which are used for web surfing, 
be it a laptop, a tablet, or a smartphone. The ease of usability, stability, security, and 
add-on features provided by Chrome clearly makes it one of the best browsers avail¬ 
able. It can be downloaded from https://www.google.com/intl/en/chrome/browser/. 

Firefox 

Firefox is another free web browser and is developed by Mozilla Foundation. The 
customization provided by Firefox allows to modify it to your desire. One of the 
greatest features of Firefox is the huge list of browser add-ons, which allows to tailor 
it for specific requirements. Similar to Chrome it is available for various platforms. It 
can be downloaded from https://www.mozilla.org/en-US/firefox/all/. 


Copyrighted material 




Lab environment 13 


In this book we will mainly be using Chrome and Firefox as our browsers of 
choice. In a later chapter we will be customizing both to suit our needs and will also 
try out some already modified versions. 

So in this chapter we have understood the basic technologies as well as the envi¬ 
ronment we will be using. The main motivation behind this chapter is to build the 
foundation so that once we are deep into our main agenda i.e., web intelligence, we 
have a clear understanding of what we are dealing with. The basic setup we have sug¬ 
gested is very generic and easy to create. It does not require too much installations 
at the initial stage, the tools which will be used later will be described as they will 
come into play. In the forthcoming chapter we will be diving deep into the details of 
Open source intelligence. 


Copyrighted material 


PA 14 


This page intentionally left blank 


Copyrighted material 



PA15 


CHAPTER 


Open Source Intelligence 
and Advanced Social 
Media Search 

INFORMATION IN THIS CHAPTER 

• Open source intelligence (OSINT) 

• Web 2.0 

• Social media intelligence (SOCMINT) 

• Advanced social media search 

• Web 3.0 



INTRODUCTION 

As wc already covered the basic yet essential terms with little details in the previous 
chapter, it’s time to move on to understanding the core topic of this book, that is open 
source intelligence also known by its acronym OSINT, but before that we need to rec¬ 
ognize how we see the information available in public and up to what extent we see it. 

For most of us internet is limited to the results of the search engine of our choice. 

If we talk about a normal user who wants some information from the internet he/ 
she directly goes to a search engine; let’s assume it’s one of the most popular search 
engine Google and puts a simple search query. A normal user unaware of advanced 
search mechanisms provided by Google or its counterparts puts simple queries he/ 
she feels comfortable with and gets a result out of it. Sometime it becomes difficult 
to get the information from search engine due to poor formation of the input queries. 

For example, if a user wants to search for a windows blue screen error troubleshoot , 
he/she generally enters in the search engine query bar “my laptop screen is gone blue 
how to fix this,” now this query might or might not be able to get the desired result 
in the first page of the search engine, which can be a bit annoying at times. It’s quite 
easy to get the desired information from the internet, but we need to know from where 
and how to collect that information, efficiently. A common misconception among 
users is that the search engine that he/she prefers has whole internet inside it, but in 
real scenario the search engines like Google have only a minor portion of the internet 
indexed. Another common practice is that people don’t go to the results on page two 
of a search engine. Wc all have heard the joke made on this that “if you want to hide a 
dead body then Google results page two is the safest place.” So we want all our read¬ 
ers to clear their mind if they also think the same way, before proceeding to the topic. 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-801867-5.ft0002-l J 5 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






16 CHAPTER 2 OSINT and Advanced Social Media Search 


OPEN SOURCE INTELLIGENCE 

Simply stated, open source intelligence (OSINT) is the intelligence collected from 
the sources which are present openly in the public. As opposed to most other intelli¬ 
gence collection methods, this form does not utilize information which is covert and 
hence does not require the same level of stealth in the process (though some stealth 
is required sometimes). 


OSINT comprises of various public sources, such as: 

• Academic publications: research papers, conference publications, etc. 

• Media sources: newspaper, radio channels, television, etc. 

• Web content: websites, social media, etc. 

• Public data: open government documents, public companies announcements, etc. 


Some people don’t give much heed to this, yet it has proven its importance time 
and again. Most of the time it is very helpful in providing a context to the intel¬ 
ligence provided from other modes but that’s not all, in many scenarios it has been 
able to provide intelligence which can directly be used to make a strategic decision. 
It is thought to be one of the simplest and easiest modes by many if not most, yet it 
does has its difficulties; one of the biggest and unique out of all is the abundance of 
data. Where other forms of intelligence starve for data, OSINT has so much data that 
filtering it out and converting it into an actionable form is the most challenging part. 

OSINT has been used for long time by government, military as well as the cor¬ 
porate world to keep an eye on the competition and to have a competitive advantage 
over them. 

As we discussed, for OSINT there are various different public sources from which 
we can collect intelligence, but during the course of this book we will be focusing 
on the part which only uses internet as its medium. This specific type of OSINT is 
called as WEBINT by many, though it seems a bit ambiguous as there is a differ¬ 
ence between the internet and web (discussed in Chapter 1). It might look like that 
by focusing on a specific type we are missing a huge part of OSINT, which would 
have been correct few decades earlier but today where most of the data are digitized 
this line of difference is slowly thinning. So for the sake of understanding we will be 
using the terms WEBINT and OSINT interchangeably during this book. 


HOW WE COMMONLY ACCESS OSINT 

SEARCH ENGINES 

Search engines are one of the most common and easy method of utilizing OSINT. 
Every day we make hundreds of search queries in one or more search engines, 
depending upon our preference and use the search results for some purpose. Though 


Copyrighted material 



PA17 


How we commonly access OSINT 


the results we get seem simple but there is a lot of backend indexing goes on based 
on complex algorithms. The way we create our queries make a huge difference in the 
accuracy of the result that we actually seek from a search engine. In a later chapter 
we will discuss how to craft our queries so that we can precisely get the result that 
we desire. Google, Yahoo, and Bing are well-known examples of the search engines. 

Though it seems like search engines have lots of information, yet they only index 
the data which they are able to crawl through programs known as spiders or robots. 
The part of the web these spiders are able to crawl is called as the surface web, the 
rest of the part is called as the dark web or darknet. This darknet is not indexed as it 
is not directly accessible via a link. Example of darknet is a page generated dynami¬ 
cally using the search option on a web page. We will discuss about darknet and asso¬ 
ciate terms in a later chapter. 

NEWS SITES 

Earlier the popular mediums of news were newspaper, radio, and television; but the 
advancement in the internet technology has drastically changed the scenario and 
today every major news vendor has a website where we can get all the news in a 
digital format. Today there even exist news agencies which only run online. This 
advancement has certainly brought news at the touch of our fingertips at anytime, 
anywhere where there is an internet connection available. For example, bbc.com is 
the news website for the well-known British Broadcasting Corporation. 

Apart from news vendors, there are sites run by individuals or a group as well 
and some of them focus on topics which belong to specific categories. These sites 
are mainly present in form of blogs, online groups, forums, or IRCs (Internet Relay 
Chat), etc. and are very helpful when we need the opinion of the mass on a specific 
topic. 

CORPORATE WEBSITES 

Every major corporation today runs a website. It’s not just a way to present your exis¬ 
tence but also interact directly with customers, understand their behavior, and much 
more. For example, www.gm.com is the corporate website for General Motors. We can 
find out a plethora of information about a company from its website. Usually a corporate 
website contains information like key players in the organization, their e-mails, com¬ 
pany address, company telephone, etc. which can be used to extract further information. 

Today some of the corporate websites also provide information in the form of 
White Papers, Research Papers, corporate blogs, newsletters subscription, current 
clients, etc. This information is very helpful in understanding not only the current 
state of the company but also its future plans and growth. 

CONTENT SHARING WEBSITES 

Though there are various types of user-generated content out there which contains 
an amalgam of text as well as various different multimedia files, yet there are some 


17 


Copyrighted material 




18 CHAPTER 2 OSINT and Advanced Social Media Search 


sites which allows us to share a specific type of content such as videos, photo, art, etc. 
These types of sites are very helpful when we need a specific type of media related to 
a topic as we know exactly where to find it. YouTube and Flickr are good examples 
of such sites. 

ACADEMIC SITES 

Academic sites usually contain information to some specific topics, research 
papers, future developments, news related to a specific domain, etc. In most cases 
this information can be very crucial in understanding the landscape for current 
as well as future development. Academic sites are also helpful in learning traits 
which arc associated to our field of interest and also understand the correlation in 
between. 

The information provided in the academic sites is very helpful in understanding 
the developments that are taking place in a specific domain and also to get a glimpse 
of our future. They are not only helpful in understanding the current state of develop¬ 
ment but also generating ideas based upon them. 


BLOGS 

Weblogs or blogs started as a digital form of personal diary, except they are public. 
Usually people used to write blogs in a simply way to express their views on some 
topics of interest, but this has changed in the past decade. Today there are corporate 
blogs, which talk about the views of the company and can reveal a lot about its pur¬ 
suits; there are blogs on specific topics which can be used to learn about the topic; 
there are blogs related to events, etc. 



FIGURE 2.1 

A blog on bylogger.in. 


Copyrighted material 











PA19 


Web 2.0 


Blogs reveal a lot about not just the topic written about, but also about its author. 
In many job applications it is desired to have a blog for the applicant as it can be used 
to understand his/her basic psychological profile, communication skills, command 
over the language, etc. 


GOVERNMENT SITES 

Government sites contain a huge amount of public data. This includes not just infor¬ 
mation about the government but also about the people it is serving. Usually there 
are government sites which contain information about registered companies, their 
directors, and other corporate information; then there are sites which contain infor¬ 
mation about specific departments of the government; there are also sites where we 
can complain regarding public issues and check the status on it; etc. 

From geopolitics perspective, government sites can be a good source of informa¬ 
tion for the development of a country, current advancements, its future plans, etc. 

So now this is how we usually interact with the internet today, but it was not 
always like this. There were no blogs, no social media, no content sharing, etc. so 
how did we got here, let’s see. 


WEB 2.0 

Earlier websites used to be mainly static, there was not much to interact with. Users 
simply used to open the web pages and go through the text and images and that was 
pretty much it. Around the late 1990, the web started to take a new form. The static 
pages were being replaced by user-generated content. The websites become interac¬ 
tive, people started to collaborate online. This was the advent of Web 2.0. 

Web 2.0 drastically changed the way the web was interacted with. Earlier the 
content shared by the webmasters was the only information one could access, now 
people could post data on the web, opinions were being shared and challenged. This 
changed the way information was generated; now there were multiple sources to con¬ 
firm or discredit a piece of data. People could share information about themselves, 
their connections, their environment, and everything they interacted with. 

Now people were not just the viewers of the content of the web but the creators 
of it. The feature to interact and collaborate allowed people to create new platforms 
for information sharing and connecting around in the virtual reality. Platforms like 
content sharing portals, social networks, weblogs, wikis, etc. started to come into 
existence. Virtual world slowly started to become our second home and a source of 
plethora of information, which would have not existed earlier. 

This virtual world is now our reality. The ability to create content here allows us 
to share whatever information we want, our personal information, our professional 
information, our feelings, our likes/dislikes, and what not. Here we can tell others 
about ourselves and at the same time learn about others. We can share our views 
about anything and understand how other people perceive those issues. It allows us 
to interact with the whole world by sitting in a nook of it. 


19 


Copyrighted material 


20 CHAPTER 2 OSINT and Advanced Social Media Search 


Today on these social platforms of ours it’s not just individuals who exist, but 
there is much more. There are people in form of communities and/or groups; there 
are pages of political parties, corporates, products, etc. Everything we used to deal 
with in real life is being replicated in the virtual world. This certainly has brought the 
world closer in a sense and it does affect our lives. 

The web at its current stage is not only a part of our life but it also influences it. 
By sharing our feelings, desires, likes/dislikes online we let others to know about us, 
understand our personality, and vice versa. Similarly the content posted online plays 
a huge role in our decision making. The advertisements we see online are personal¬ 
ized and depend upon our online behavior and these ads influence what we buy. Be 
it a political hashtag on Twitter or a viral video on YouTube we daily process a lot of 
online data and it does make a difference in our decisions. 

Today the web has evolved to a level where there is abundance of data, which is a 
good thing as it increases the probability of us finding the answers to our questions. 
The issue with this is how to extract relevant information out of this mammoth and 
this is exactly what we will be dealing with in this book, starting from this chapter. 


SOCIAL MEDIA INTELLIGENCE 

Social media is an integral part of the web as we know it. It is mostly where all the 
user-generated content resides. Social media intelligence or SOCM1NT is the name 
given to the intelligence that is collected from social media sites. Some of these may 
be open, accessible without any kind of authentication and some might require some 
kind of authentication before any information is fetched. Due to its partially closed 
nature some people don’t count it as a part of OSINT, but for the sake of simplicity 
we will be considering it so. 


Some social media types are: 

• Blogs (e.g.. Blogger) 

• Social network websites (e.g., Facebook) 

• Media sharing communities (e.g., Flickr) 

• Collaborative projects (e.g., Wikipedia) 


As now we have a clear idea about OSINT as well as social media from its per¬ 
spective, let’s move on to understand one of the integral part of social media and a 
common source of information sharing, i.e., social networks. 


SOCIAL NETWORK 

A social network website is a platform which allows its users to connect with each other 
depending upon their area of interest, location they reside in, real life relations, etc. 
Today they are so popular that almost every internet user has a presence on one or more 
of these. Using such websites we can create a social profile of our own, share updates, 
and also check profiles of other people in which we have some form of interest. 


Copyrighted material 



PA21 


Social network 


Some of the common features of social network websites are: 

• Share personal information 

• Create/join a group of interest 

• Comment on shared updates 

• Communicate via chat or personal message 


Such websites have been very helpful in connecting people over boundar¬ 
ies, building new relations, sharing of ideas, and much more. They are also very 
helpful in understanding an individual, their personality, ideas, likes/dislikes, 
and what not. 

INTRODUCTION TO VARIOUS SOCIAL NETWORKS 

There are several popular social network sites where we are already registered, but 
why so many of these different social network sites, why not just couple of them? 
The reason is different social network focuses on different aspects of life. Some 
focus on generic real life relations and interests like Facebook, Google+, etc. Some 
focus on its business or professional aspect like Linkedln and some on microblog¬ 
ging or quickly sharing of views like Twitter. There are lot more popular social net¬ 
works with different aspects but in this chapter we will restrict to only these some of 
the popular ones, which are: 

• Facebook 

• Linkedln 

• Twitter 

• Google+ 




"— 

■ •. 


i 



FIGURE 2.2 

Social network sites. 


21 


Copyrighted material 



















































22 CHAPTER 2 OSINT and Advanced Social Media Search 


Facebook 

Facebook is one of the most popular and widely used social network sites. It was 
founded on February 4, 2004 by Mark Zuckerberg with his college roommates. Ini¬ 
tially Facebook was restricted among Harvard University students, but now it’s open 
for all to register and use Facebook whose age is above 13, though no proof is required. 
Among all other social network sites it contains the most wide-age group audience due 
to some of its popular features and generic aspects. Currently it has over a billion active 
users worldwide and adds over half a petabytes of data every 24 h. 

It allows creating a personal profile where user can provide details like work and 
education, personal skills, relationship status, add family member details, add basic 
information like gender, date of birth, etc.; contact information like e-mail id, web¬ 
site details, etc.; and also life events. It also allows creating a page for personal or 
business use which can be used as a profile. We can also create groups, join a group 
of his/her interest, add other Facebook users based on relations or common interest, 
and categorize the friends. Like something we like, comment on something, share 
what we feel as status, check in where we were, share what we are doing right now, 
add pictures and videos. We can also share messages with someone or in a group 
publicly or privately and chat with someone. Adding notes, creating events, and play¬ 
ing games are some of its other features. 

Now you might be thinking that why are we sharing all this information, because 
as a Facebook user most of us are aware of all these things. The reason to put a light 
on these features is it will help us in OSINT. As we discussed earlier, Facebook adds 
over half a petabyte data every 24 h, it has more than a billion active user and it 
allows users to share almost everything, combining all these three above statements 
we can say Facebook contains petabytes of structured data of over billion users like 
what a user likes, a user’s basic information such as his/her name, age, gender, cur¬ 
rent city, hometown, work status, relationship status, current check-ins where he/ 
she visited recently, everything which is a treasure hunt for any information gather¬ 
ing exercise. Now though mostly we don’t use Facebook for hardcore intelligence 
purposes but still we use its search facility sometime to search for some person or 
page, etc. Like some day we remembered a school friend and we want to search for 
him/her in Facebook, so we search his/her name in Facebook or we search his/her 
name with the school name to get the result. Other option that can be used is if there 
is a group for the schoolmates, we can directly go there and search for the friends. 
Based on our preferences, location, studied school, college, colleagues, and friends, 
Facebook also recommends us some friends with people you may know option. This 
option also helps a lot to search someone in Facebook. We will cover advanced ways 
of searching in Facebook in an upcoming topic. 

Facebook does allow setting privacy in most of the things mentioned above, like 
whom you want to share this information with, whether public, friends and friends of 
friends, only friends or only me. It also allows users to block inappropriate content or 
users, report spam and inappropriate content. But guess what most of us are unaware 
of these functionalities or simply ignore them. 


Copyrighted material 


Social network 23 


Linkedln 

If you are a job seeker, jobholder, job provider, or business person, Linkedln is the 
best place to stay active. It can be called as professional network where people are 
mostly interested in business-oriented stuffs. It has more than 259 million members 
in over 200 countries and territories. 

Linkedln allows us to register and create a profile. The profile basically consists 
of name, company name, position, current work location, current industry type, etc. 
Here we can also add details about our work like job position and responsibilities, 
add educational details, honor and award details, publications, certificates, skills, 
endorsement, projects undertaken, languages known, almost our complete profes¬ 
sional life. Apart from that Linkedln also allows us to add personal details such as 
date of birth, marital status, interests and contact information, which are a concern 
for certain employers. 

Like Facebook it also allows us to connect with other users of similar interest or 
with whom we have some level of relationship. To maintain professional decorum 
Linkedln restricts us to invite others if we have got too many of responses like “I 
don’t know” or spam for our connection requests. Similar to Facebook there arc also 
different groups present in Linkedln where we can join to share common interest. 
It also provide feature to like, comment, and share whatever we want and also to 
communicate with other connections via private message. The one simple yet rich 
feature of Linkedln over Facebook is that in Facebook we can only see the mutual 
friends between two users, but in Linkedln it will show us how we are connected with 
a particular user just by visiting his or her profile. It also shows what are the things 
common between two of us so that we can easily understand on what and up to what 
extent the other user is similar to us. Other major thing is that in Linkedln if we sneak 
into someone’s profile, that user will get to know that someone has viewed his/her 
profile. Though this can be set to partial anonymous or full anonymous using privacy 
settings but still it’s a very good feature in terms of professional network. Let’s say 
we are a job seeker and some recruiter just sneaked into your profile, then we can 
expect a job offer. Like Facebook, Linkedln also allows us to set privacy policy on 
almost everything. 

Linkedln is a great place for job seekers as well as job providers. The profile can 
be used as bio-data/resume or CV where recruiter can directly search for candidate 
based on the skill set requirement. Other than that it also has a job page where we can 
search or post jobs. We can also search for jobs based on his current industry type or 
company followed by him/her. Job seeker can search job based on location, keyword, 
job title, or company name. 

Now from an OSINT perspective like Facebook, Linkedln also has a lot of struc¬ 
tural information or we can say structural professional information about a particular 
user and company such as full name, current company, past experience, skill sets, 
industry type, company other employee details, company details, etc. and using some 
advanced search techniques in Linkedln we can collect all those information effi¬ 
ciently which we will discuss soon. 


Copyrighted material 


24 CHAPTER 2 OSINT and Advanced Social Media Search 


Twitter 

Twitter is a microblogging service type social network. It allows us to read the short 
140 character (or less) based messages as known as tweets without registration but 
after logging in we can both read as well as compose tweets. It is also known as SMS 
of the internet. 

Unlike other social network sites Twitter has a user base which is very diverse in 
nature. Nowadays Twitter is considered as the voice or speech of a person. Tweets are 
considered as statements and arc parts of news bulletin, etc. 

The major reason why it is considered as voice of a person is because of its veri¬ 
fied accounts. Verify account is a feature of Twitter which allows celebrities or public 
figures to show the world that it is the real account, though sometimes they also 
verify their account just to maintain control over the account that bears their name. 

Like other social networking sites when we get registered in Twitter, it allows 
us to create a profile, though it contains very limited information like name, Twitter 
handle, a status message, website detail, etc. 

A Twitter handle is like a username which uniquely identifies us in Twitter. When 
we want to communicate with each other we use this Twitter handle. Twitter handle 
generally start with a sign and then some alphanumeric characters without space, 
for example, @myTwitterhandle. It allows us to send direct message to another user 
privately via messages or publicly via tweets, it also allows us to group a tweet or topic 
by using hashtag “#”. Hashtag is used as a prefix of any word or phrase such as #LOL 
which is generally used to group a tweet or group a topic under funny category. 

A word, phrase, or topic that is tagged mostly, within a time period is said to 
be a trending topic. This feature allows us to know what is happening in the world. 
Twitter allows us to follow other users. We can tweet, or simply share someone’s 
tweet which is also known as retweets. It also allows us to favorite a tweet. Like 
other social network sites it also allows us to share images and videos (with certain 
restrictions). Tweets visibility is by default public but if a user wants then he/she can 
restricts their tweets to just their followers. Twitter is nowadays popularly used for 
announcement or giving verdict, statement, or replying to something online. The 
tweets of the verified account are taken as direct statement of that person. Corporates 
use this for advertising, self-promotion, and/or announcements. 

Unlike the two social networks we discussed earlier Twitter does not contain 
much personal or professional data, yet the information it provides is helpful. We 
can collect information about social mentions, such as if you want to search details 
about infosec bug bounty, you can search in Twitter with a hashtag and you will get 
lots of tweets related to this where you can collect information such as which are the 
companies into bug bounty. What is the new bug bounty started? Who all are partici¬ 
pating in bug bounty, etc. Unlike other social network sites Twitter has large amount 
of structured information based on phrases, words, or topics. 

Google+ 

Google+ also known as Google+ is a social networking site by Google inc. It is 
also known as identity service which allows us to associate with the web-contents 


Copyrighted material 


PA25 


Advanced search techniques for some specific social media 


created by us directly by using it. It is also the second largest social networking site 
after Facebook with billions of registered and active users. As Google provides vari¬ 
ous services such as Gmail, Play store, YouTube, Google Wallet, etc., the Google+ 
account can be used as a background account for these. 

Like other social networking sites we just came across, Google-i- also allows us to 
register ourselves, but the advantage that Google+ has over other social networking 
sites is that most Gmail (the popular e-mail solution by Google) users will automati¬ 
cally be a part of it just by a click. Like other social network sites we can create pro¬ 
file which contains basic information like name, educational details, etc. 

Unlike other social networking sites a Google+ profile is by default a public pro¬ 
file. It allows video conferencing via Google hangout. It allows us to customize our 
profile page by adding other different social media links that we own like a blog. 

We can consider it as one background solution for many Google services but yet 
it has its own demerit. Many users has one or more Gmail accounts that they actively 
use but in case of Google+ they might have the same number of accounts but they can 
use only one as active account. So in this way there is a chance that the total number 
of registered accounts and active user ratio might be very less as compared to other 
social networking sites. 

Like its competitors Google+ also allows to create, join communities, follow or 
add friends, share photos, videos, or locations but one feature that makes Google-i- a 
better social networking site is its +1 button. It’s quite similar to like button in Face- 
book but the added advantage is that when the +1 count is higher for a topic or a link 
it increases its PageRank in Google also. 

Now the OS1NT aspect of Google+, like other social networking sites Google+ 
also have a huge amount of structured data of billion users. Other feature that makes 
Google+ a better source of information gathering is that the profiles are public. So no 
authentication required to get information. One another advantage of Google-i- over 
other social sources is that it’s a one stop solution; here we can get information about 
all the Google content a user is contributing or at least the other Google services 
details a user is using. This can be pandora of treasure. 


ADVANCED SEARCH TECHNIQUES FOR SOME SPECIFIC 
SOCIAL MEDIA 

Most of the social media sites provide some kind of search functionality to allow us 
to generally search for things or people we are interested in. These functionalities, 
if used in a bit smarter way, can be used to collect hidden or indirect but important 
information due to structural storage of user data in these social media. 


FACEBOOK 

We already discussed how Facebook can be a treasure box for information gathering. One 
functionality that helps us to get very precious information is Facebook graph search. 


25 


Copyrighted material 


26 CHAPTER 2 OSINT and Advanced Social Media Search 


Facebook graph search is a unique feature that enhances us to search people or 
things that are somehow related to us. We can opt for graph search to explore loca¬ 
tion, places, photos, and search for different people. It has its unique way of sug¬ 
gesting what we want to search based on first letters or words. It starts searching an 
item from different category of Facebook itself such as people, pages, groups, and 
places, etc. and if sufficient results are not found it starts searching on Bing search 
engine and provides user with sufficient results. To provide most relevant results, 
Facebook also looks into our relation or at least area of interest and past experi¬ 
ence, for example, we can get those things in higher ranking result those are either 
liked, commented, shared, tagged, check-in, or viewed either directly by us or by our 
friends. We can also filter the results based on social elements such as people, pages, 
places, groups, apps, events, and web results. The technology that Facebook is using 
in its graph search can be defined as the base of the semantic web, which we will 
discuss at the end of this chapter. 

Now though we learned about the feature that can allow us to search different 
things in Facebook but still the question is how? Now let’s start with some simple 
queries. 

Just put photos in search bar and you will be suggested by Facebook with some 
queries such as photos of my friends, photos liked by me, my photos, photos of X, 
etc. and similarly we can get lots of photo-related queries or we may create our own 
queries. So based on photos what we can get ultimately a query such as “Photos 
taken in Bangalore, India commented on by my friends who graduated before 2013 
in Bhubaneswar, India” so it’s basically about our own imagination what exactly we 
want to retrieve, then based on keywords we can create complex queries to get the 
desired results; though Facebook will suggest some of the unexpected queries based 
on keywords mentioned in the search bar. Similarly we can search for persons, loca¬ 
tions, restaurant, employee of a particular company, music, etc. 

Some basic queries related to different social elements are as follows: 

1. Music I may like 

2 . Cities my friends visited 

3 . Restaurants in Macao, China 

4 . People who follow me 

5 . Single females who live in the United States 

6. My friends who like X-Men movies 

7 . People who like football 

Now let’s combine some of these simpler queries to create a complex one 
“Single women named ‘Rachel’ from Los Angeles, California who like football 
and Game of Thrones and live in the United States.” Now isn’t it amazing! And yes 
we can create query using following filters, like based on basic information such as 
name, age, gender, based on work, and education such as class, college passing year, 
degree name, based on likes and dislikes, based on tagged in, commented on, based 
on living, and also based on relationships. Now it’s our wild imagination that can 
lead us to create different queries to get desired result. 


Copyrighted material 


PA27 


A https://www.facebook.com/5earch/str/rachel/users-named/single/users/11246B092102121/residents/present/7413: 


Single women named "Rachel" from Los Angeles, California who like Football and Game of Thrc Q. 


I 



Rachel 



University of Or... 



From Los Angeles, California Lives in Eugene, Oregon 
Single 

Likes Oregon Foo tball. Game of Thrones and 731 others 
Studies Theatre at|^ 


If Add Friend 


P Message | ••• | 


Relationship-based searches are still being built, so you may see additional results here in the 
future. Learn more about who can see each others' relationships in news feed, search and other 

places on Facebook. 

End of results 


FIGURE 2.3 

Facebook graph search result. 

LINKEDIN 

As we discussed how Linkedln has its structural data of billions of users and what 
can we get if we search for something in particular, let’s see how to search this partic¬ 
ular platform. Linkedln provides a search bar in top to search for people, jobs, com¬ 
panies, groups, universities, articles, and many more. Unlike Facebook, Linkedln has 
its advance search page where we can add filters to get efficient result. Following is 
the page link for advanced search in Linkedln: 

https://www.LinkedIn.com/vsearch/p?trk=advsrch&adv=true 


D www.linkedin.com/vsearch/f?orig=TRNV&rsid=673680421402331295068&trk=vsrp_all_sel&trkInfo=VSRPsearchId%3A673€ Q. | I 



People 

Jobs 


Located in or near: 


Search for people, jobs, companies, and more... 


Advanced People Search 


Relatonsho 
Ei 1st Connections 
B 2nd Connections 
B Group Members 
□ 3rd ♦ Everyone Else 


Location 

Current Company 


industry 
Past Company 


School 


Pro He Language 
Nonproft Interests 


0 Groups 

□ OSMT 

□ Information Security Community 
] Offensive SecurCy/Offsec 

□ Strategic and Competitive Mefigence Profe... 
Members of Investigative Reporters and EdL. 

03 Years of Experience 

Q3 Function 

H Seniority Level 

03 Interested In 


Q3 Fortune 


03 V 


FIGURE 2.4 

Linkedln advanced search options. 


Copyrighted material 





























































28 CHAPTER 2 OSINT and Advanced Social Media Search 


This advanced search page allows us to search for jobs and people based on cur¬ 
rent employee, past employee, job title, zip code radius, interested in, industry type, 
etc. It also allows us to search based on type of connection. 

Different input boxes and their uses 

• Keyword 

The keyword input box allows a user to insert any type of keyword such as 
pentester or author, etc. 

• First Name 

We can search using first name. 

• Last Name 

We can search using last name. 

• Title 

Title generally signifies to the work title. Using which user will be provided 
with a drop down menu with four options to choose like current or past, current, 
past, past not current to enrich the search. 

• Company 

We can search using company name. It also comes with a drop down menu with 
the options we just discussed. 

• Location 

This drop down box comes with two options, i.e, located in or near and any¬ 
where. User can use whatever he/she wants. 

• Country 

Search based on country. 

• Postal Code 

Search based on postal code. There is a lookup button present for user to check 
whether the entered postal code is for the desired location or not. By entering 
postal code automatically a within drop down box enables which contains fol¬ 
lowing options to choose: 

1. 10mi(15km) 

2. 25mi (40km) 

3 . 35mi (55km) 

4. 50mi (80km) 

5 . 75mi (120km) 

6. lOOmi (160km) 

This can be used to select the radius area you want to include in search along 
with the postal code. 

• Relationship 

This checkbox contains options to enable direct connection search, connection 
of connection search, group search, and all search. User can enable the final 
option, i.e., 3rd+ everyone else to search everything. 

• Location 

This option is for adding another location which is already mentioned in postal 
code. 


Copyrighted material 


Advanced search techniques for some specific social media 29 


• Current Company 

This option allows a user to add current company details manually. 

• Industry 

It provides a user with different options to choose one or more at a time. 

• Past Company 

This option allows to add past company details manually. 

• School 

Similar to past company we can add details manually. 

• Profile Language 

It provides a user to choose different languages one or more at a time. 

• Nonprofit Interests 

It provides user to choose two options cither bored services or skilled volunteer 
or both. 

The options which are present in the right side of the advanced search page are 
only for premium account members. There are other added functionality also 
present only for premium users. 

The premium member search filter options are 

• Groups 

• Years of Experience 

• Function 

• Seniority Level 

• Interested In 

• Company Size 

• Fortune 

• When Joined 

Apart from all these Linkedln also allows us to use Boolean operators. Below 
are the operators with simple examples: 

• AND: It can be used for the union of two keywords such as developer AND 
tester. 

• OR: It can be used for options. Let’s say a recruiter want to recruit a guy for 
security industry so he/she can search something like pentester OR “security 
analyst” OR “consultant” OR “security consultant” OR “information security 
engineer.” 

• NOT: This can be used to exclude something from other things let’s say a 
recruiter wants fresher level person for some job but not from training domain 
so he/she can use developer NOT trainer. 

• (Parentheses): This is a powerful operator where a user can group something 
from other such as (Pentester OR “Security Analyst” OR “Consultant” OR 
“Security Consultant” OR “Information Security Engineer”) NOT Manager. 

• “Quotation”: It can be used to make more than one words as a single keyword 
such as “Information Security Engineer.” Now if we use the same word without 
quotation. Linkedln will treat it as three different keywords. 

Unlike search engines which can hold a limited keyword in search box, Linkedln 
allows unlimited keywords that is a major plus for the recruiters to search for skill 


Copyrighted material 


30 CHAPTER 2 OSINT and Advanced Social Media Search 


sets and other job requirement keywords in Linkedln. So it provides the user freedom 
to use any number of keywords he/she wants with the use of operators wisely to cre¬ 
ate a complex query to get desired result. 

Example of a complex query to look for information security professionals but 
who are not manager: 

((Pentester OR “Security Analyst” OR “Consultant” OR “Security Consultant” 
OR “Information Security Engineer”) AND (Analyst OR “Security Engineer” OR 
“Network Security Engineer”)) NOT Manager. 




((Pentester OR "Security Analyst* OR "Consultant" OF* 


SEARCH 


Advanced > 


619,387 results for ((Pentester OR “Security Analyst" OR '‘Consultant" OR “Security Consultant” OR 
“Information Security Engineer") AND (Analyst OR “Security Engineer" OR “Network Security 
Engineer) )NOT Manager _ 


FIGURE 2.5 

Linkedln advanced search result. 


TWITTER 

So as we discussed earlier Twitter is basically about microblogging in the form of 
tweets and hence it allows us to search for tweets. Now simply inputting a keyword 
will get us the tweets related to that keyword but in case we need more specific 
results we need to use some advanced search operators. Let’s get familiar with some 
of them. 

In case we want to search tweets for specific phrases we can use the “”, for 
example, to search for the phrase pretty cool the query would be “pretty cool.” 
To look hashtag we can simply type the hashtag itself (e.g., #hashtag). In case 
we want to search for a term but want to exclude another specific term, we can 
use the - operator. Say, for example, we want to search for hack but don’t want 
the term security, then we can use the query hack -security. If we want the results 
to contain either one or both of the terms, then we can use the OR operator, 
such as Hack OR Security. To look for the results related to a specific Twitter 
account, we simply search by its Twitter handle (@Sudhanshu_C). The filter 
operator can be used to get specific type of tweet results, for example, to get 
tweets containing links we can use filter:links. From and To operators can be 
used to filter the results based upon the sender and receiver respectively, e.g.. 
From: sudhanshujc, To:paterva. Similarly Since and Until can be used to specify 
the timeline of the tweet, e.g., hack since:2014-01-27, hack until:2014-01-27. 
All these mentioned operators can be combined to get better and much precise 
results. To checkout other features we can use the Twitter advanced search page 
at https://Twitter.com/search-advanced, which has some other exiting features 
such as location-based filter. 


Copyrighted material 












PA31 


Web 3.0 


f -> C rt | fi Twitter, Inc [US) | https-y/twittefXiom/vcarch ddvjnoed & 5 



FIGURE 2.6 

Twitter advanced search options. 

SEARCHING ANY OPEN SOCIAL MEDIA WEBSITE 

So wc learned about social networks and how to search some of them, but what about the 
platforms that we need to search, but don’t support any of the advanced search features 
we discussed about. Don’t worry we have got you covered, there is a simple Google 
search trick which will help us out, it is the site operator. A Google search operator is 
simply a way to restrict the search results provided by Google within a specific con¬ 
straint. So what the site operator does is that it restricts the search results to a specific 
website only, for example, if we want to search for the word “hack,” but we only want 
results from the Japanese Wikipedia website, the query we will input in Google would be 
site:ja.xvikipedia.org hack. This will give results for the word hack in the site we speci¬ 
fied, i.e., ja.wikipedia.org. Now if we want to search in multiple platforms at once there is 
another Google operator which comes in handy, it is the OR operator. It allows us to get 
results for cither of the keywords mentioned before and after it. When we combine it with 
the site operator it allows us search results from those specific platforms. For example, 
if we want to search the word “hack” in Facebook as well as Linkedln the Google query 
would be site:Facebook.com OR site:Linkedln.com hack. As we can see these operators 
are simple yet very effective, we will learn more about such operators for Google as well 
as some of the lesser known yet efficient search engines in the coming chapters. 


WEB 3.0 

So we discussed about Web 2.0 its relevance and how it affects us and also how to 
navigate through some of the popular social networks, now let’s move forward and 
see what we are heading toward. Until now most of the data available on the web are 


31 


Copyrighted material 









































32 CHAPTER 2 OSINT and Advanced Social Media Search 


unstructured, though there are various search engines like Google, Yahoo, etc., which 
continuously index the surface web yet the data in itself has no standard structure. 
What this means is that there is no common data format which is followed by the 
entire web. The problem with this is that though search engines can guide us to find 
the information we are looking for yet they can’t help us answer complex queries or a 
sequence of queries. This is where semantic web comes in. Semantic web is basically 
a concept where the web follows a common data format which allows giving mean¬ 
ing to the data. Unlike Web 2.0 where human direction is required to fetch specific 
data, in the semantic web machines would be able to process the data without any 
human intervention. It would allow data to be interlinked not just by hyperlinks but 
meaning and relations. This would not only allow data sharing but also processing 
over boundaries, machines would be able to make a relation between data from dif¬ 
ferent domains and generate a meaning out of it. This semantic web is a crucial part 
of the web of the future, Web 3.0 and hence is also referred as semantic web by many. 

Apart from semantic web there are many other aspects which would contribute 
toward Web 3.0, such as personalized search, context analysis, sentiment analysis, 
and much more. Some of these features are already becoming visible in some parts of 
the web, they might not be mature enough yet the evolution is rapid and quite vivid. 


Copyrighted material 


PA33 



Understanding Browsers 
and Beyond 


3 


INFORMATION IN THIS CHAPTER 


• Browser’s basics 

• Browser architecture 

• Custom browsers 

• Addons 


INTRODUCTION 


In first chapter we discussed a little about web browsers in general, then we moved 
on to put some light on different popular browsers such as Chrome and Firefox and 
also tried to simplify the process behind browsing. Now it’s time to understand what 
exactly happens in background. You might think that why is this required. As we 
have gone through some of the details earlier in this book the reason to focus on 
browsers and discuss different aspects of it in details is because the majority of tools 
we will use in the course of this book are mainly web based and to communicate with 
those web-based tools we will use browsers a lot. That’s why it is very important to 
understand the working of a browser and what exactly is going on in background 
when we do something in it. Learning the internal process of how browser operates 
will help us choosing and using it efficiently. Later we will also learn about ways 
to improve the functionalities of our daily browsers. Now without wasting much 
time on definitions and descriptions which we already covered, let’s get to the point 
directly and that is “The secrets of browser operation.” 


BROWSER OPERATIONS 


When we open a browser, we will generally find an address bar where we can insert 
the web address that we want to browse; a bookmark button to save the link for future 
use; a Show bookmark button, where we can see what all bookmark links we already 
have in the browser; back and forward button to browse pages accordingly; a home 
button to redirect from any page to the home page which has been already set in the 
browser and an options button to set all the browser settings such as to set home 
page, download location, proxy settings, and lots of other settings. The location of 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978.0-12-801867-5.ft0003-3 33 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 







34 CHAPTER 3 Understanding Browsers and Beyond 


these buttons might change with the versions to provide better user experience but 
somewhere in the interface of the browser you will find all these buttons for sure. 

As all the browsers have quite similar user interfaces, with most of the function¬ 
alities common as discussed above but still there are some facilities and functional¬ 
ities that make each browser unique. There are different popular browsers such as 
Chrome, Firefox, IE, Opera, and Safari but as discussed earlier in Chapter 1, we will 
focus mostly on two browsers which are also present in open source versions and 
they are Chrome and Firefox. 


HISTORY OF BROWSERS 

The first browser was written by Tim Berners-Lee in 1991 which only displayed text- 
based results. The first user-friendly commercial graphical browser was Mosaic. To 
standardize the web technology an organization was found named World Wide Web 
Consortium, also known as W3C in 1994. Almost all of the browsers came into the 
market in mid-1990s. Today browsers are much more powerful than they were in 
early 1990s. The technology has evolved rapidly from text only to multimedia and 
is still moving on, today browsers display different type of web resources such as 
video, images, documents along with HTML and CSS. How a browser will display 
these resources are specified by W3C. 


BROWSER ARCHITECTURE 

Browser architecture differs from browser to browser, so based on common compo¬ 
nents if we derive an architecture it will be something as follows. 


Data Persistence 





FIGURE 3.1 


Browser architecture. 



Copyrighted material 


















PA35 


Browser architecture 


USER INTERFACE 

The user interface here is what we have already discussed above. It’s all about the 
buttons and bars to access the general features easily. 


BROWSER ENGINE 

It’s the intermediate or combination of layout engine with render engine. Layout 
engine is nothing but the user interface. 

RENDERING ENGINE 

It’s responsible for displaying the requested web resources by parsing the contents. 
By default it can parse html, xml, and images. It uses different plugins and/or exten¬ 
sions to display other type of data such as flash, PDF, etc. 

There are different rendering engines such as Gecko, WebKit, and Trident. 
Most widely used rendering engine is WebKit or its variant version. Gecko and 
WebKit are open source rendering engines while Trident is not. Firefox uses 
Gecko, Safari uses WebKit, Internet Explorer uses Trident, Chrome and Opera 
uses Blink, which is a variant of WebKit. Different rendering engines use different 
algorithms and also have their different approaches to parse a particular request. 
The best example to support this statement is that you might have encountered some 
website which work with a particular browser because that website is designed 
compatible to that browser’s rendering engine so in other browsers they don’t 
work well. 

NETWORKING 

This is a major component of a browser. If it fails to work, all other activities will fail 
with it. The networking component can be described as socket manager which takes 
care of the resource fetching. It’s a whole package which consists of application 
programming interfaces, optimization criteria, services, etc. 

Ul BACKEND 

It provides user interface widgets, drawing different boxes, fonts, etc. 

JAVASCRIPT INTERPRETER 

Used to interpret and execute java script code. 

DATA PERSISTENCE 

It is a subsystem that stores all the data required to save in a browser such as ses¬ 
sion data. It includes bookmarks, cookies, caches, etc. As browsers store cookies 
which contain user’s browsing details that are often used by marketing sites to push 


35 


Copyrighted material 


36 CHAPTER 3 Understanding Browsers and Beyond 


advertisement. Let’s say we wanted to buy a headphone from some e-commerce site 
so we visited that site but never bought that. Then from our browsing data marketing 
sites will get this information and will start pushing advertisements at us of the same 
product may be from that same e-commerce site or others. This component definitely 
has its own importance. 

ERROR TOLERANCE 

All the browsers have traditional error tolerance to support well-known mistakes 
to avoid invalid syntax errors. Browsers have this unique feature to fix the invalid 
syntax that’s why we never get any invalid syntax error on result. Though different 
browsers fix these errors in different way but anyhow all the browsers do it on or 
other way. 

THREADS 

Almost every process is single threaded in all the browsers, however, network opera¬ 
tions are multithreaded. It’s done using 2-6 numbers of parallel threads. In Chrome 
the tab process is the main thread, while in other browsers like Firefox and Safari 
rendering process is the main thread. 


BROWSER FEATURES 

Web browsing is a very simple and generic term that we all are aware of, but are 
we aware of its importance. A web browser opens a window for us to browse all 
the information available on the web. Browsers can be used for both the purposes, 
online browsing as well as offline browsing. Online browsing is that we do regu¬ 
larly with an internet connection. Offline browsing means opening local html con¬ 
tents in a browser. Modern browser also provides features to save html pages for 
offline browsing. These features allow a user to read or go through something later 
without any internet connection; we all have used this feature sometime during our 
browsing experience. When we save a page for offline view, sometime we might 
find that certain contents in a page are missing during offline browsing. The reason 
being that when we save a page it only saves direct media available for the page, but 
if a page contains resources from some other sites then those things will be found 
missing in offline view. Let’s discuss some of the added functionalities provided 
by browsers. 

PRIVATE BROWSING 

Incognito is the term associated with Chrome for private browsing, whereas Firefox 
uses private browsing only as the term. It allows us to browse the internet without 
saving details of what we browse for that particular browsing session. We can use 
private browsing for online transactions, online shopping, opening official mails on 
public devices, and much more. 


Copyrighted material 


PA37 


Browser features 


In Firefox and Chrome we can find this option near new window option. The 
shortcut key to open secure browsing in Firefox is Ctrl+Shift+P and for Chrome is 
Ctrl+Shift+N. The difference between normal browsing window and a private brows¬ 
ing window is that we get some kind of extra icon present in the title bar of the 
window. In Firefox it’s a mask icon whereas in Chrome it’s a detective icon. For the 
fancy features, browsers use these kinds of fancy icons. 



® t Starch or tntrr addrtts V (? 9 0 - Google 



Private Browsing 

Firefox won’t remember any history for this window. 


In a Private Browsing window, Firefox won't keep any browser history, search history, 
download history, web form history, cookies, or temporary internet files. However, 
files you download and bookmarks you make will be kept. 

To stop Private Browsing, you can close this window. 

9 While this computer won't have a record of your browsing history, your internet service 
provider or employer can still track the pages you visit. 

Leam More 


FIGURE 3.2 

Firefox private browsing. 


Private browsing will not save any visited pages details, form fill entries, search 
bar entries, passwords, download lists, cached files, temp files or cookies. Though the 
data downloaded or bookmarked during secure browsing will be saved in local system. 

What private browsing does not provide? 

It only helps user to be anonymous for local system while the internet service pro¬ 
vider, network admin, or the web admin can keep track of the browsing details and it 
will also not protect a user from keyloggers or spywares. 

There is always an option available to delete the data stored by a browser manu¬ 
ally. We can simply click on clear recent history button and select what needs to be 
deleted and it’s done. 


AUTOCOMPLETE 

Almost all browsers have this feature to configure it to save certain information such as 
form details and passwords. This feature has different names in different browsers or 
it is specific with different rendering engines. Some of the names are Password Auto- 
complete, Form Pre-filing, Form Autocomplete, Roboform, Remember password, etc. 

Browsers provide user freedom to configure whether to save these information or 
not, if yes then whether to get some kind of prompt or not, what to be saved and in 
what type it should be saved. 


37 


Copyrighted material 

















38 CHAPTER 3 Understanding Browsers and Beyond 


In Firefox to avoid password storage, go to Menu -»Options -» Security -> Uncheck 
“Remember passwords for sites,” though we can store password in encrypted format 
using browser configuration. 

In Chrome, go to Menu —> Settings -> Show advanced settings -> Under Passwords 
and forms uncheck “Enable Auto-fill to fill out web forms in single click” and “Offer 
to save your web password.” 

Some web application treat this as a vulnerability or possible security risk so they 
used to add an attribute “autocomplete=olT” in their form of the input box value that 
they do not want a browser to save, but nowadays most of the browsers either ignore 
it or have stopped supporting this attribute and save all the data or some based on 
browser configuration. 

PROXY SETUP 

Proxy setup feature is also an important feature provided by any browser. This feature 
allows a user to forward the requests made by a browser to an intermediate proxy. 

Most of the companies use some sort of proxy device to avoid data leakage and 
those settings can be done in browser to limit or monitor the browsing process. 
Proxy options are also popularly used by penetration testers to capture the request 
and responses sent and received by a browser. They generally use some interception 
proxy tool and configure the settings in browser. 

In day-to-day life also proxy setup can be used for anonymous browsing or brows¬ 
ing or visiting some pages that are country restricted. In that case a user just has to 
collect one proxy IP address and the port number of some other country where that 
site or content is available then setting up the same in a browser to visit those pages. 

Proxy setup in Firefox 

Go to Menu -> Options -* Advanced Network Connection Settings -* Manual 
proxy configuration and add the proxy here. 

Proxy setup in Chrome 

Go to Menu —> Settings — Show advanced settings -> Under Network click on Change 
proxy settings -> Click on LAN Settings -> Check Use a proxy server for your LAN 
(These settings will not apply to dial-up or VPN connections.) and add your settings. 


RAW BROWSERS 

There are specific browsers available by default with specific operating systems, 
such as Internet Explorer for Windows and Safari for Mac. Almost all the browsers 
have their versions available for different operating system. But the widely used and 
popular browsers are not the one which comes preinstalled with operating system but 
the one which are open source and easily available for different operating systems, 
i.e., Mozilla Firefox and Google Chrome. Though Google Chrome was mostly used 
by Windows operating system, one of its open source version was generally found 


Copyrighted material 


PA39 


Raw browsers 


preinstalled in many Linux operating systems and is called Chromium. As the name 
itself has similarities with the Google Chrome browser, their features also do match 
with each other with a little difference. 

As earlier we came across that there are different types of browser rendering 
engine like Gecko, WebKit, etc. and Chrome uses Blink the variant of WebKit so 
does Chromium. This project initially started in 2008 and now there are more than 
35 updated versions. This is one of the popular browsers among the open source 
community. It is the concept behind Google Chrome window being used as the main 
process because Chromium project was made to make lightweight, fast, and efficient 
browser which can also be known as shell of the web by making its tab to be the main 
process. There are different other browsers released based on the Chromium project 
source code. Opera, Rockmclt, and Comodo Dragon are some of the well-known 
browsers based on Chromium. 

Now one thing is clear from above paragraph that if a browser will be open source, 
then community will use that code to create different other browsers by adding some 
extra functionality as Comodo Group added some security and privacy feature in the 
Chromium and released it as Comodo Dragon. Similarly Firefox also has different 
custom versions. So let’s consider the base version browser as Raw browser and 
other browsers as customized browser. 

WHY CUSTOM VERSIONS? 

The custom versions are being used for different purposes, to use the true power of 
functionalities of the Raw browser to the fullest or in simple words to make better use 
of the features available by the Raw browsers. The custom browsers can help us to 
serve our custom requirements. Let’s say we want a browser which can help us being 
online 24/7 in social networking sites. We can either add different social network 
addons on the browser of our choice to make it happen or we can start from scratch 
and build a version of browser which contains the required functionalities. Similarly 
for other cases like if we are penetration testers or security analysts, we might want a 
browser to perform different application security tests so we can customize a browser 
for the same. A normal user might need a browser to be anonymous while browsing 
so that no one can keep track of what he/she is browsing, this can also be done by 
customizing a browser. There are already a number of customized browsers available 
in the market to serve these purposes and similarly we can also create such custom¬ 
ized browser according to our desire. As the process is a bit complex to be included 
in this chapter and would require some technical background to understand we will 
not be discussing it, still knowing that it is possible to do so opens a new window, for 
people who would like this just take it as a self-learning project. 

The Chromium project has its official website, http://www.chromium.org where 
we can find different documentations and help materials to customize the browser for 
different operating systems. Apart from Chromium it is maintained at sourceforge, 
http://www.sourceforge.net/projects/chromium. From here we can download the 
browser, download browser source code, subscribe to the mailing list to get updated 
news about the project, and submit bugs and feature requests. 


39 


Copyrighted material 




40 CHAPTER 3 Understanding Browsers and Beyond 


If you are interested to customize Chromium, it will be a great kick start if you 
subscribe the mailing list as well as explore the documentation available in source- 
forge. The first step to customize any browser is to get its source code. So how to get 
the source code of Chromium? It’s quite easy, we just need to download the latest tar 
or zip version of the browser. Later by performing untar or unzip we will be able to 
get the source code along with the documentation details inside. 

Now let’s move on to discuss some already customized browsers and their 
functionalities. 


SOME OF THE WELL-KNOWN CUSTOM BROWSERS 

EPIC (https://www.epicbrowser.com/) 

Epic is a privacy browser as its tagline describes itself with the line “We believe what 
you browse and search should always be private.” It is made to extend the online 
privacy of a user. This browser is based on the Chromium project and developed by 
the Hidden Reflex group and is available for both for Windows and OSX. 

On visiting their official website, we will get one paragraph with heading “Why 
privacy is important?”, this paragraph contains some of the unique and effective rea¬ 
sons for it, one such is that when we browse the data collected from that can decide 
whether we are eligible to get a job, credit, or insurance. Epic was first developed 
based on Mozilla Firefox but later it was changed to the Chromium-based browser. It 
works quite similar to the secure browsing feature by Firefox and Chrome. It deletes 
every session data such as cookies, caches, and any other temporary data after exit¬ 
ing the browser. It removes the services provided by Chrome to send any kind of 
information to any particular server and it adds a no tracking header to avoid tracking 
by data collection companies. It also prefers SSL connection over browsing and also 
contains a proxy to hide user IP address. To avoid leak of the search preferences. Epic 
routes all the search details through a proxy. 

Here we saw a customizing Chromium project. Epic which was developed as a 
privacy centric browser. 

HconSTF (http://www.hcon.in/downloads.html) 

HconSTF stands for Hcon security testing framework. It is a browser-based testing 
framework. With the package of different addons added in the browser, it allows a 
user to perform web application penetration testing, web exploit development, web 
malware analysis along with OSINT in a semiautomated fashion. 

HconSTF has two variants; one is based on Firefox that is known as Fire base and 
the other based on Chromium that is known as Aqua base. The rendering engines are 
also different as per the base Raw browser. Fire base uses Gecko and Aqua base uses 
WebKit. Both the versions are loaded with tons of addons. 

The core idea or inspiration of this project is taken from hackerfox but it’s not 
quite similar to that. Hackerfox http://sourceforge.net/projects/hackfox/ is portable 


Copyrighted material 





42 CHAPTER 3 Understanding Browsers and Beyond 


Arabic, Spanish, Turkish, French, Chinese simplified, and also in Chinese tradi¬ 
tional. As it is very popular in the security community it comes by default installed in 
popular security operating systems such as Backtrack and Matriux. 

It has security addons preinstalled and configured and with its simple yet user- 
friendly user interface. Mantra is an integral part of every web application pen tester’s 
arsenal. The tools available in Mantra not only focus on web application testing but 
also on web services and network application penetration testing. It contains tools to 
switch user agent, manipulate cookie, manipulate parameters and their values, add 
proxy, and many more. FireCAT is also included in Mantra and that makes it more 
powerful tool (we will cover FireCAT in next topic separately). 


Some of the popular tools groups are mentioned below: 

• Information gathering 

• Flagfox 

• Passiverecon 

• Wappalyzer 

• Application audit 

• Rest client 

• I Iackbar 

• Dom inspector 

• Editors 

• Firebug 

• Proxy 

• Foxyproxy 

• Network utilities 

• Fircftp 

• FircSSH 

• Misc 

• Event spy 

• Session manager 




WikipbdiA 


Yahoo? 


OWASP 


facebook 


Hackery Galley 


□I 

BTi 

^delicious! 


FIGURE 3.3 




g^SHODAN 


Mantra browser interface. 


Copyrighted material 















Some of the well-known custom browsers 43 


Apart from tools it also contains bookmarks. The bookmark is divided into two 
sections. First section is known as Hackery. It is a collection of different penetration 
testing links which will help a user in understanding and referring a particular attack. 
The other section contains gallery. It contains all the tools links that can be used for 
penetration testing. 

We can download both the versions of Mantra from the following URL, 
http://www.getmantra.com/download.html or the individual download links are 
below. Where Mantra based on Firefox is available for different operating systems 
like Windows, Linux, and Macintosh whereas MOC is only available for Windows. 

Mantra based on Firefox can be downloaded from http://www.getmantra. 
com/down 1 oad. h tm 1 . 

Mantra based on Chromium can be downloaded from http://www.getmantra. 
com/mantra-on-chromium.html. 


FireCAT (http://firecat.toolswatch.org/download.html) 

FireCAT or Firefox Catalog for Auditing exTensions is a mind map collection of differ¬ 
ent security addons in a categorized manner. Now it’s collaborated with OWASP Mantra 
project to provide one stop solutions to security addons based on browser customiza¬ 
tion. FireCAT contains seven different categories and more than 15 subcategories. 


The categories and subcategories: 

• Information gathering 

• Whois 

• Location info 

• Enumeration and fingerprint 

• Data mining 
Googling and spidering 

• Proxies and web utilities 

• Editors 

• Network utilities 

Intrusion detection system 

• Sniffers 

• Wireless 

• Passwords 

• Protocols and applications 

• Misc 

• Tweaks and hacks 

• Encryption/hashing 

• Antivirus and malware scanner 

• Antispoof 

• Antiphishing/pharming/jacking 

• Automation 

• Logs and history 
Backup/synchronization 

• Protection 

• IT security related 

• Application auditing 


Copyrighted material 







44 CHAPTER 3 Understanding Browsers and Beyond 


There is a category present “IT security related.” This one is an interesting category 
because it provides you plugins to collect information about the common vulnerabili¬ 
ties and exposures (CVEs) and exploits from various sources such as Open Sourced 
Vulnerability Database (OSDV), Packet storm, SecurityFocus, Exploit-DB, etc. 

ORYON C (http://sourceforge.net/projects/oryon/) 

Oryon C portable is an open source intelligence framework based on Chromium 
browser meant for open source intelligence analysts and researchers. Like other cus¬ 
tomized browsers it also comes with lots of preinstalled tools and addons to support 
the OSINT investigation. It also contains links to different online tools for better ref¬ 
erence and research. It is a project by “osintinsight” so some of the functions a user 
can only use after subscribing some package of Osintinsight. 


Oryon CNcm lab x y 

- -» c * 

Add Sonic* 0lnl«IRSS U*di f'l S«jf<h Er»jr«« | ‘] Cc>mp»fr/S«vch " Do<um*fd$* 


w * 

Sjn OU 90 64 328T(17 961C) omrcart clouds (g) 


C. ■ ' «P. V ” ~ □ 

PeopleS*»/cK ’~l Em»ilSe»tch f 15mis«Seerch f'l Social WediiSeerch 


b Btekko 

H Zapmeta 

* Hootsute 


Web News Images Videos Blogs Maps 


CH Resources People Search Company Search Patent Search Crime; 


FIGURE 3.4 

Oryon C browser interface. 


0 ScBnq ^ Backup and Recowcri 


• Carrot2 H bcquick £fjJ Ctuuz Search 


□ Bing 



2 2*ngual □ Soovte iD ID 


• KUA 



«> Feedly Q Evemote » OSINT Search 


+ 


It’s a straightaway use tool so no need to install Oryon C, just download and 
run it. It only supports Windows operating system 32 and 64 bit. The huge list 
of useful addons and categorized bookmarks makes it a must-have for any online 
investigator. 


WhiteHat Aviator (https://www.whitehatsec.com/aviator/) 

Though WhiteHat Aviator is not the one of its kind available but definitely it is a 
product of a reliable big brand security organization. WhiteHat Aviator is a private 
browsing browser. It’s quite similar to the Epic browser we discussed earlier in 


Copyrighted material 








PA45 


Some of the well-known custom browsers 


this chapter. It removes ads, eliminates online tracking to ensure a user to surf 
anonymously. 

Like Epic browser Aviator is also based on Chromium. By default it runs in 
incognito or private browsing mode to allow a user to surf without storing any his¬ 
tory, cookie, temporary file, or browsing preferences. It also disables autoplay of 
different media types, user has to allow a media such as flash in any page if he/she 
wants to see it. It also uses a private search engine duckduckgo to avoid string search 
preferences of the user. 

Unlike Epic browser it is not open source, so open security community can¬ 
not audit the code or contribute much. Aviator is available for Windows as well as 
Macintosh operating system. 


TOR BUNDLE (https//www.torproject.org/projects/torbrowser.html.en) 

TOR or the onion routing project is very popular project. Most of us definitely have 
used, heard, or read about it somewhere some time. Though we will discuss about 
it in detail in a later chapter but for the time being let’s discuss the basics about the 
tor browser bundle. Like Epic browser and Whitehat Aviator, tor browser is also 
a privacy centric browser. But the way it works is quite different from the other 
two. Through the tor application it uses the volunteer distributed relay network and 
bounces around before sending or receiving connection. It makes it difficult to back¬ 
track the location of the user and provides privacy and anonymity to the user. Due to 
its proxy chaining type of concept it can be used to view the contents that are blocked 
for a particular location such as a country. The tor browser is available for different 
operating systems such as Windows, Linux, and Macintosh and can be used straight¬ 
away without installation. Tor browser or previously known as TBB or tor browser 
bundle is a customized browser based on Firefox. It contains tor button, tor launcher, 
tor proxy, HTTPS everywhere, NoScript, and lots of other addons. Like OWASP 
Mantra it is also available in 15 different languages. 


CUSTOM BROWSER CATEGORY 

As we came across different custom browsers, their base build, what rendering engine 
they use etc. Let’s categorize them to understand their usability. 

For easy understanding let’s make three categories. 

1. Penetration testing 

2. OSINT 

3. Privacy and anonymity 

Under the first category we can find HconSTF, Mantra, FireCAT, whereas 
under OSINT category we can add HconSTF and Oryon C, likewise we can put 
Epic browser, Whitehat Aviator and tor browser under privacy and anonymity 
category. If we look at the core, what puts all these different browsers in different 


45 


Copyrighted material 




46 CHAPTER 3 Understanding Browsers and Beyond 


category, the answer will be the addons or the extensions. So by adding some 
similar functional addons we can create a customized browser for a specific pur¬ 
pose. If we want to create our own browser for some specific purposes we must 
keep this in mind. 


PROS AND CONS OF EACH OF THESE BROWSERS 

Let’s start with the first browser we discussed and that is Epic browser. The advan¬ 
tage of using this browser is that it fully focuses on user privacy and anonymity. Apart 
from that it’s open source and it can be used by all kind of users, technical as well as 
nontechnical. The only disadvantage is that the reliability factor. Is this browser does 
what it intends to do or does it do something else. As trust on the source is the key 
here. So either trust the source and use the product or use it then trust the product. 

The advantage of using HconSTF is that it’s a one stop solution for information 
security researchers. The only disadvantage it has is that it does not allow a user to 
upgrade it to the next level. 

The advantage of OWASP Mantra is that it is available in different languages to 
support security community from the different parts of the world. It has only one 
disadvantage is that the light version or the MOC is only available for Windows, not 
for other operating systems like Linux or Macintosh. 

The advantage of Oryon C is that it is very helpful in OSINT exercises, but there 
are different disadvantages like to use some of the modules a user need to subscribe 
and also it is only available for Windows. 

The disadvantage of the Whitehat Aviator is that it is not open source and it does 
not have a version for Linux operating system. 

TBB has the advantage is that it provides anonymity with a disadvantage like it 
only comes with one rendering engine Gecko. 

As we already discussed these custom browser categories; based on category, 
user can choose which browser to use, but definitely the browsers for anonymity and 
privacy have larger scope as they do not belong to any single category of users. Any 
user who is concern about his/her online privacy can use these browsers. Like for 
e-shopping, netbanking, social networking as well as e-mailing, these browser can 
be helpful to all. 


ADDONS 

Browser addon, or in other terms browser extension or plugins are the same things 
but known differently in terms of different browsers. As in Firefox it’s known as 
addon and in Chrome as extension. Though plugin is a different component from 
addons but still some use it as synonym for addon. In reality plugin can be a part of 
addon. 


Copyrighted material 


Addons 47 


Browser addons are typically used to enhance the functionality of a browser. 
These are nothing but applications designed using web technology such as 
HTML, CSS, and JavaScript. Though due to difference in rendering engines 
the structure and code are different for different browser addons, but nowadays 
there are different tools and frameworks available to design a cross browser 
addon. 

Addons are so popular that every web user might have used it already at some 
point or another. Some of the popular addons are YouTube downloader, Google 
translate in common and SOA client. Rest client, and hackbar in case of penetration 
testers. 

We can install addon quite easily in both the browsers, Firefox as well as 
Chrome by simply clicking on install button. Addons are not always safe so 
choose them wisely so download from trusted sources and also after going 
through reviews. Sometimes we need to restart the browser to run a particular 
addon. Like other softwares, addons also keep on looking for their updates and 
update themselves automatically. Sometimes we might see that the addon is not 
compatible with the browser version that means there are two possibilities, (1) 
browser version is outdated, (2) addon is not updated to match with the require¬ 
ments of latest browser installed. Sometime it’s also possible that an addon might 
affect the performance of a browser and can even make the browser slow. So 
choose your addons wisely. 

Let’s discuss some common addons and extensions that are available for both 
Firefox as well as Chrome to serve in day-to-day life. Let’s see what kind of addons 
are available and to serve what purpose. 

We all use YouTube to watch video and share. Sometimes we also want to 
download some YouTube videos so for that a number of addons are available 
by installing which we do not need to install any other additional downloading 
software. Another major issue we feel while watching videos in YouTube is that 
the ads. Sometime we are allowed to skip the ads after 5 s and sometime we have 
to watch the full 20 s ad. That is pretty annoying so there are addons available to 
block ads on YouTube. Most of the people are addicted to social networking sites, 
we generally open one or all of these at least once every day. Social networks like 
Facebook, Linkedln, Twitter are like part of our life now. Sometime we need to 
see the pictures of our friends or someone else in social networking sites and we 
need to click on the picture to zoom that. It wastes lots of valuable time, so if we 
want an addon to zoom all those for us when we point your mouse on the picture 
then there is addons available known as hoverzoom both in Firefox as well as 
Chrome. 

There are different addons also available for chat notification, e-mail notification, 
news, weather. It looks like think there are addons available for almost everything, 
we just need to explore and definitely we will get one that will simplify our life. This 
is just for brainstorm, now let’s discuss about some of the popular addons which will 
help us in various different important tasks. 


Copyrighted material 


48 CHAPTER 3 Understanding Browsers and Beyond 


SHODAN 

It is a plugin available for Chrome. A user just has to install and forget about it. While 
we browse an application, it will collect information available about the particular 
site from its database and provide details such as what is the IP address of the web¬ 
site, who owns that IP, where is it hosted, along with open ports with popular services 
and some of the popular security vulnerability such as HeartBleed. This is definitely 
very helpful for penetration testers, if you haven’t tried it yet, you must. The only 
limitation of this addon is that it will only show the results for the sites, for which 
information is already available in shodan sources. It generally won’t show results 
for new sites and staging sites as its database might not contain information on them. 


WAPPALYZER 

It is also a popular addon available for both the browsers Firefox and Chrome. It 
uncovers the technology used by the web application. Similar to shodan, for wap- 
palyzer also we simply need to install and forget, wappalyzcr will show details about 
technology used while we browse a page. The way exactly wappalyzer works is 
that it collects information related to the technology and versions from the response 
header, source code, and other sources based on the signatures. 

It identifies various different technologies such as CMS or content management 
systems, e-commerce platforms, web servers details, operating system details, JavaS¬ 
cript framework details, and many other things. 


Some of the types of technologies identified by wappalyzer are: 

• Advertising networks 

• Analytics platforms 

• Content management system 

• Databases 

• E-commcrce 

• Issue trackers 

• JavaScript frameworks 

• Operating systems 

• Programming languages 

• Search engines 

• Video players 

• Wikis 


BUILDWITH 

Buildwith is similar to wappalyzer. It also identifies technologies used by a web 
applications based on signatures, using page source code, banner, cookie names, etc. 
While wappalyzer is open source, buildwith is not. The paid version of buildwith 
has way more features from its free version like contact information detection and 
subdomain detection, etc. which can be very helpful at times. 


Copyrighted material 




PA49 


Addons 


f Twitter 

9 ir H © A ht tps.' t>wtttf.tt>m Q e § E l~ Ooogi'e 


_ 



built With 

twitter.com/ 


O Dyn DNS 

Usage Statistics • Websites using Dyn DNS 

DNS scmces pvended by Dyn. 


Usage Sucisc.cs • Websites using SPf 

The Sender Poky Framework is an open standard 
specif)- ng a technical method to prevent sender 
address forgery. 


I* Apps for Business 

Usage Statistics ■ Websites using Coogle Apps for 
Business 

Web-based errai. calendar, and documents for 


FIGURE 3.5 

Buildwith identifying technologies on Twitter. 


FOLLOW 

Follow.net is a competitive intelligence tool which helps us to stay updated by the 
online movement of our competitors and can be accessed using the browser addon 
provided by it. The major difficulty faced to keep track of the competitor is that we 
have to waste lots of time visiting their websites, blogs, tweets, YouTube channel, 
etc. After visiting lots of website we don’t have a structured data from that we can 
understand the trend being followed. So here is follow.net that do most of these and 
much more for us and provides us report on how our competitor is trending on the 
web. It collects information from various sources such as alexa, Twitter, keyword- 
spy, etc. It will also send us a notification related to our competitors, if something 
new comes up. The simple addon of follow provides a complete interface to browse 
through all this information in an efficient manner. 

So if we are starting a business want to learn the success mantra of your competi¬ 
tor then it is a must-have. The follow.net addon is available for both Firefox as well 
as Chrome browser. 


RIFFLE 

Riffle by CrowdRiff is a social analytics addon. It’s focused on the popular micro- 
blogging site Twitter. It provides us with a smart Twitter dashboard which displays 
useful analytical data about a Twitter user of our choice. 

It provides us the helpful information to create a popular account by giving refer¬ 
ence to some influential tweets and accounts who posted them. It also provides quick 
insight about a Twitter user so that it will help us to understand and reply to that 
particular user in a particular way. 


49 


Copyrighted material 































50 CHAPTER 3 Understanding Browsers and Beyond 


3 Twitter. Ir>c. [US]! https//twitter.com 




» * & = 


4?No»ifications # Oncovw ^ M 


P o 


® Search @usemjT>e 



1 Sudhanshu Chauhan 

* $$uon*HfHj_C - 

FOLLOWING FOLLOWERS 

349 128 1 

> new Tweet 

. J 

« C—1 

•Clung* 

20M 

X 

of India 

IIIO f-^ 

■1 

•1 

Express 


3 


nfluM 



Tweets 

WSJD WSJO 38s 

WSJ 0 Turtter named anthonynoto as its CFO replacing n gupla. who becomes 
head of strategic investments, bit ty/imQmirv 

Expand 


w 



-*Ti 

Who to 


zz Sg£ Nat# Silver Nat«Sr-?r538 57s 

“““g Llo Messi is thought of as the best soccer player in the worm. He’s underrated 

53eig ht/imEuXKN 


b r 


II 


Guy Kawasaki 

@Guy Kawasaki 


b # eg ?■ 7 Q ® ffi) 


TWEETS 

129k 


FOLLOWING FOLLOWERS 

108k 1.4m 


Chief evangelist of Carrva https://! coAi8€c 
9 Silicon Valley. Calif... canva com/about 


H 

Connect otn| 


ftCTWCCTfi/TWCCT f AVOPl T f ?;/T Wt f T 

9.2 9.4 




View Top Tweets on MyTopTweet com > 


TOP HASHTAGIS) MORE 

^ ■■ sfacefcooktips wkawasefie »twittertips ft... 

® 20U Twit# 

Cookes Adi TOP mention;*] mors 

@canva @ebengregory @atpinnovations 


Anns jees 


•*» Reoir tt Retweet * Favorite ••• More 


r 


OPURLfft) 

is.gd brt.ly buff ly paper.li owJy 


* 


a m 


Chris Carr#tt i'chrisga rett 1m 

Happy Canada Day:) inslagram corrVprp6Pr4ppW5IAf 

Expand 





FIGURE 3.6 


Riffle interface integrated into the browser. 


Some of the key feature provided by this extension is that it helps in tweet source 
tracking, activity breakdown, engagement assessments, etc. with a clean user inter¬ 
face. It’s a must-have for power users of Twitter. 


WhoWorks.at 

Similar to Riffle a Twitter focused addon, we have whoworks.at a Linkedln specific 
addon. Let’s take a scenario where we are salespersons and we need to gather infor¬ 
mation about the key influential persons of a company, so how do we proceed. We 
will go to Linkedln, search for that particular company and then find the 1st degree, 
2nd degree, or 3rd degree connections. Based on their title we might want to add 
them to discuss business. This is the old fashion way. Now there is another way to 
do the same in a more automated manner. Now let’s install whoworks.at extension 
on Chrome, visit the company website that we are interested in and let the extension 
show us the 1st degree, 2nd degree, and 3rd degree connections from that company 
along with details such as recent hires, promotions, or title changes. 

This is the power of whoworks.at, it finds the connections for us when we visit a 
website and saves us a lot of time. 

ONETAB 

Onetab is an addon or extension available for both the browsers Firefox as well as 
Chrome. It provides us solution for tab management. It helps us to make a list of 
tabs that are open in our browser and aggregate them under a single tab, especially 


Copyrighted material 























PA51 


Addons 


in Google Chrome as we already learned that it is a tab centric browser. The tab is 
the main thread in Chrome so by using onetab we can save lot of memory because it 
converts tabs into a list, which we can later restore one by one or all at a time as per 
our wish. 


SALESLOFT 

Most of the sales people must have used it, if not they need to. It’s simply a dream 
addon for salespersons. It allows to create a prospecting list from browsing profiles 
from different social networks for leads focusing on a particular segment of mar¬ 
ket. It allows a user to run specific search based on title, organization, or industry 
name. Some of the popular features arc it allows to gather contact information from 
a prospect from Linkedln. Contact information contains name, e-mail id, and phone 
number. We are allowed to add any result as a prospector by single click. Import 
prospects from Linkedln and export it to excel or Google spreadsheets. It also allows 
to synchronize the data directly with salesforce.com 

It is a one stop free and lightweight solution for every sales person. Use it and 
enhance your lead generation with its semiautomated approach. 


PROJECT NAPTHA 

We all know how it’s nearly impossible to copy the text present in any image, one 
method is that we type it manually but that is definitely a bizarre experience. So here 
is the solution. Project Naptha. It is an awesome addon which provides us freedom 
to copy, highlight, edit, and also translate available text on any image present on the 
web using its advanced OCR technology. It’s available for Google Chrome. 

TINEYE 

Tineye is a reverse image search engine, so its addon is also used for same. As we 
enter keywords in search engines to get the required result, Tineye can be used to 
search for a particular picture in the Tineye database. It has a large amount of images 
indexed in its database. The myth behind the image identification technology is that it 
creates a unique signature for each and every image it indexes in its database. When 
user search for a picture it starts comparing that signature, and most of the time it gives 
exact result. Apart from exact result it also gives similar results. Another great feature 
of Tineye is that it can search for cropped, resized, and edited images and give almost 
exact result. Tineye is available for both the browsers Firefox as well as Chrome. 


REVEYE 

Reveye is quite similar to Tineye. This addon is only available for Chrome. It works 
very simple. It gives a user result of reverse image search based on results provided 
by Google reverse image search as well as Tineye reverse image search. 


51 


Copyrighted material 




52 CHAPTER 3 Understanding Browsers and Beyond 


CONTACTMONKEY 

Contactmonkey is a very useful addon for all professionals, especially sales. It helps us 
to track our e-mails. Using this simple addon we can identify if the person we have sent 
an e-mail has opened it or not and at what time. This can help us to identify whether our 
mails are being read or are simply filling up the spam folder and also what is the best time 
to contact a person. Though the free version has some limitations yet it is very useful. 

If you want to improve your user experience of Google Chrome browser 
this list by digital inspiration is a must to look at. The list contains some of the 
Chrome extensions and apps list which will enhance Chrome features and also 
enhance user experience. Following is the URL where you can get the list, 
http://digitalinspiration.com/google-Chrome. 


BOOKMARK 

Bookmark is a common feature of every browser. It allows us to save a website’s 
URL under a name for later use. Most of the time while browsing we get different 
interesting pages but due to lack of time we cannot go through all those pages at that 
time. There bookmark help us to save those links for future use. 

There are popularly two ways to save the bookmark: 

1. By clicking on bookmark button when we are on the page that needs to 
bookmark. 

2 . By clicking on ctrl+d when we are in the page that needs to bookmark. 

We can even import and as well as export bookmarks from one browser to 
another. We can also create a new folder for a list of bookmarks. In Firefox we need 
to go to show all bookmark link or click on ctrl+shift+B where we will get all those 
options directly or by right clicking on that page. Similarly for Chrome we need to 
go to bookmark manager. There also we will find all the options on the page itself or 
otherwise we need to right click on that page to get those options. 


THREATS POSED BY BROWSERS 

As we discussed browsers are a great tool which allows us to access the web and the 
availability of various addons simply enhances its functionalities. This wide usage 
of browsers also present a huge threat. Browsers being one of the most widely used 
softwares are the favorite attack vector of many cyber attackers. Attackers try to 
exploit most of the client side vulnerabilities using browser only, starting from phish¬ 
ing, cookie theft, session hijacking, cross-site scripting, and lots of others. Similarly 
browsers are one of the biggest actors which play a role in identity leakage. So use 
your browser wisely. In later chapters we will discuss about some methods to stay 
secure and anonymous online. For now let’s move to our next chapter where we will 
learn about various types of unconventional but useful search engines. 


Copyrighted material 






PA53 



Search the Web—Beyond 
Convention 


4 


INFORMATION IN THIS CHAPTER 

• Search engines 

• Unconventional search engines 

• Unconventional search engine categories 

• Examples and usage 


INTRODUCTION 


So in the second chapter we learned how to utilize advanced search of some of the 
social network platforms to get precise results, then in the third chapter we moved on 
to see how to better utilize our common browsers in uncommon ways and now this 
chapter is about search engines. 

We all are familiar with search engines and use them for our day to day research. 

So as discussed in previous chapters, what search engines basically do is crawl 
through the web using web spiders and index the web pages based on a wide range 
of parameters, such as keywords, backlinks, etc. and based on this indexing we get 
our results for the keywords we supply. Some of the most popular search engines are 
Google, Yahoo, and Bing. 

Different search engines use different methods to rate different links and based 
upon their algorithm, assign different websites and different ranks. When we search 
for a term(s), the search engines provide results based upon these ranks. These ranks 
keep on changing based upon various different factors and this is why we might get 
different results for same query on different dates. 

So now it is safe to say that as an average user we are familiar with search engines and 
their usage. As stated earlier this chapter is about search engines; but not the conventional 
ones we use daily. The search engines we will be dealing with in this chapter are special¬ 
ized, some of these perform their search operations in a different manner and some of 
them provide search facility for specific domain. But are they really required when we 
have search engines like Google which are very advanced and keep on updating with new 
features? The short answer is yes. Though search engines like Google are very good at 
what they do, they provide generic results in the form of website links which according 
to them are relevant for the query keywords, but sometimes we need specific answers 
related to specific domain, this is when we need specific types of search engines. Let’s go 
ahead and get familiar with these and find out how useful they are. 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-801867-5.00004-5 53 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 







54 CHAPTER 4 Search the Web—Beyond Convention 


META SEARCH 

When we send a request to a regular search engine, it looks up into its own database 
for the relevant results and presents them, hut what if we want to get results from mul¬ 
tiple search engines. This is where meta search engines comes in. What meta search 
engines do is that they send the user’s query to multiple data sources, such as search 
engines, databases, etc. at once and aggregates the results into a single interface. 
This makes the search results more comprehensive and relevant and also saves the 
time of searching multiple sources one at a time. Meta search engines do not create 
a database of their own, but rely on various other databases for collecting the results. 

Poly meta (http://www.polymeta.com/) 

Polymeta is a great meta search engine which sends the search query to a wide range 
of sources and then takes the top result from each one of them and further ranks them. 
The search results of Polymeta not only contain the URLs but also its social network 
likability through number of Facebook likes for that URL. We can further drill down 
into the results through the search within feature which allows us to search for key¬ 
words inside the already aggregated results. 



<- - G polymeta.com/searchresult.jsp?sc=1136&q=search+engine&un=anonymous 

::: Apps CD People C] Company C] Social Media C] News C] Competitive Intellig... CD Search Engines CD Chrome Addons 


Web News Images Videos Blogs 


lEngii - 


PolyMetavffl [search engine 


Search I Search within] Clear 


Select Sources 


50 Results O for: search er 
Topics i nr nr, i - 


All Results (50) 

5 search engine... (42) 

o search engine 
optimization (4) 

o Search Engine 
Marketing (4) 

» Search Engine 
Land (3) 

▼ More 

e Search the web (7) 
e results (3) 
e metasearch (3) 

« Google (7) 
e Guide (4) 
e Bing (3) 
e images (2) 


Web Results 1 12131415| Next > 

1 Ixqnick Search Eng ined 

12.937 people like mis Sign Up to see what your friends we. 

Ixquick search engine provides search results from over ten best search engines 
wwwixquickcom Google 

2 . Ppqp ilfi. 

lili3 5,447 people ike this. Sign Up to see what your friends like. 

InfoSpace metasearch engine offering search of the general web, or images, 
www.dogpile com Google 

3. Wck sea rch e ngi ne - Wi kipedia, the f ree enc y clopedia^ 

369 people like this. Sign Up to see what your friends like. 

A web search engine is a software system that is designed to search for 
en.wikipedia.org/wiki/Web_search_engine Google 

4. DuckPucK OfiO. 

103.347 people like this. Sign Up to see what your friends Ike. 

The search engine that doesn’t track you. ... Involved: Community ■ Feedback 
duckduckgo.com Google 

5 Yahoo Search - Web Search 

7.608 people Ike this Sign Up to see what your friends like. 

The search engine that helps you find exactly what you're looking for. Find the 
search.yahoo.com Google 


News 

Chinese search engine Baidu 


goes 


live in Brazil 

Chinese search engine Baidu has 
finally started to operate in Brazil on 
Thursday, nearly two years after its 
developer set up an office in the 
www.zdneLcom/chinese-search- 
engme-baidu-goes-lrve-in-brazil- 
7000031771 


H 

Google. 

YaHo°'V> 

a>. 

C ing / + i 

Abut I 1 

MoreN^j 


m 


FIGURE 4.1 

Meta search engine—Polymeta. 

Polymeta categorizes the results into topics, which are displayed inside a panel 
on the left; results for news, images, videos, blogs are displayed in separate panels 
on the right. It also allows us to select the sources from a list for different categories. 


Copyrighted material 























































PA55 


Introduction 


Ixquick (https-y/www.ixquick.com) 

Ixquick is another meta search engine and in its own words is “the world’s most 
private search engine.” Apart from its great capability to search and present 
results from various sources it also provides a feature to use Ixquick proxy to 
access the results. In the search results itself, below every result there is an option 
named as “proxy,” clicking on which will take us to the result URL but through 
the proxy (https://ixquick-proxy.com), which allows us as a user to maintain our 
anonymity. 

Apart from the regular web, images, and video search, Ixquick provides a unique 
search capability, i.e., phone search. We can not only search for the phone number 
of people but can also do a reverse phone search. It means that we need to provide 
the phone number and choose the country code and it will fetch the information of 
the owner. Not only this, this phone search functionality also allows us to search for 
phone numbers of businesses, we simply need to provide the business name and loca¬ 
tion details. Ixquick also provides advanced search, which can be accessed by the 
following URL https://www.ixquick.com/eng/advanced-search.htmL 



FIGURE 4.2 

Ixquick phone search. 

Mam ms (http://mamma.com/) 

Mamma is yet another meta search engine. Similar to any meta search engine it also 
aggregates its results from various sources, but that is not all what makes it stand out. 
The clean and simple interface provided by Mamma makes it very easy to use even 


55 


Copyrighted material 























56 CHAPTER 4 Search the Web—Beyond Convention 


for a first time user. The result page is clean and very elegant. We can access various 
categories such as news, images, video, etc. through simple tabs which are integrated 
into the interface itself once used. Clicking on the Local button allows us to get the 
region specific results. 

The tabulation feature we discussed not only creates different tabs for categories 
but also for different queries which allows us to access results from previous search 
easily. 


PEOPLE SEARCH 

Now we have a fair understanding of how meta search works, let’s move on to 
learn how to lookout for people online. There are many popular social media plat¬ 
forms like Facebook (facebook.com), Linkedln (linkedin.com), etc. where we can 
find out a lot about people, here we will discuss about search engines which index 
results from platforms like these. In this section we will learn how to search for 
people online and find related information. The information we expect from this 
kind of engagements is full name, e-mail address, phone number, address, etc.; this 
all information can be used to extract further information. This kind of informa¬ 
tion is very relevant when we require information about person to perform a social 
engineering attack for an InfoSec project or need to understand the persona of a 
potential client. 

Spokeo (http://www.spokeo.com) 

When it comes to searching people, especially in the US no one comes close to this 
people search engine. Though most of the information provided by it is now paid 
as opposed to its previous versions, but speaking from past experience it is a great 
platform which provides a variety of information related to a person ranging from 
basic information such as name, e-mail, address to information like neighborhood, 
income, social profiles, and much more. It allows to search people by name, e-mail, 
phone, username, and even address. The price package of the information provided 
by it seems reasonable and is recommended for anyone who deals with digging infor¬ 
mation about people. 

Pi pi (https://pipl.com/) 

Pipl is a great place to start looking for people. It allows us to search using name, 
e-mail, phone number, and even username. The search results can be further refined 
by providing a location. Unlike most search engines which crawl through the surface 
web only, Pipl digs the deep web to extract information for us (concept of deep web 
will be discussed in detail in a later chapter), this unique ability allows it to provide 
results which other search engines won’t be able to. The results provided are pretty 
comprehensive and are also categorized into sections such as Background, Profiles, 
Public Records, etc. The results can be filtered based upon age also. All in all it is one 
of the few places which provide relevant people search results without much effort 
and hence must be tried. 


Copyrighted material 




PA57 


Introduction 



FIGURE 4.3 

Searching people using Pipl. 

PeekYou (http://www.peekyou.com/) 

Peek You is yet another people search engine which not only allows to search using 
the usual keywords types such as name, e-mail, username, phone, etc. but also using 
terms of the type interests, city, work, and school. These unique types make it very 
useful when we are searching for alumni or coworkers or even people from past with 
whom we lived in the same city. The sources of information it uses are quite wide and 
hence the results and the best part is it’s all free. 

Yasni (http-J/www.yasni.com/) 

Yasni is a tool for people who want to find people with specific skill sets. It not 
only allows us to search people by their name but also by the domain they spe¬ 
cialize in or profession. The wide range of categories of results provided by Yasni 
makes it easy to find the person of interest. Some of the categories are images, 
telephone and address, interests, business profile, interests, documents, and much 
more. This platform provides a one stop shop for multiple requirements related 
to searching people online. 

LittleSis (httpS/littlesis. org/) 

LittleSis is exactly not a general people search, but is more focused on people at 
the top of the business and political food chain, so searching for common people 
here would be a waste of time. Although it is good at what it does and can reveal 


57 


Copyrighted material 


















58 CHAPTER 4 Search the Web—Beyond Convention 


interesting and useful information about business tycoons and political czars. Apart 
from the basic information such as introduction, DoB, sex, family, friends, educa¬ 
tion, etc., it also shows information like relationships, which lists down the positions 
and memberships that the person holds or ever held; interlocks, which lists people 
with position in the same organizations, etc. It is a good place to research about 
people with power and who they are associated with. 

MarketVisual (http-J/www. marketvisual. com/) 

MarketVisual is also a specialized search engine which allows us to search for pro¬ 
fessionals. We can search for professionals by their name, title, or company name. 
Once the search is complete it presents a list of entities with associated information 
such as number of relationships, title, and company. The best part about Marketvi¬ 
sual is the visualization it creates of the relationships of the entity once we click on 
it. These data can further be downloaded in various forms for later analysis. It is a 
great tool for market research. 



FIGURE 4.4 

Marketvisual displaying connection graph. 


They Rule (http://theyrule.net/) 

Similar to MarkctVisual, TheyRule also provides the medium to search profes¬ 
sionals across top global corporates. First look at the interface, it makes us doubt 
if there is actually any information, as there is a small list of links on the top left 


Copyrighted material 



















Introduction 


59 


corner, that too in smaller than average font size; but once we start to explore 
these links we can find an ocean of interesting information. Clicking on the 
companies link provides a huge list of companies, once we click on a company 
it will present a visual representation of it. Hovering over this icon provides 
option to show directors and research further. The directors are further repre¬ 
sented through the visualization. If any director is on more than one boards, 
then hovering over his/her icon provides the option to show that as well. It also 
provides an option to find connections between two companies. Apart from this 
it also lists interesting maps created by other such as Too Big To Fail Banks and 
also lets us save ours. 

BUSINESS/COMPANY SEARCH 

Today almost every company has an online presence in the form of a website, one or 
more social media profile, etc. These mediums provide a great deal of information 
about the organization they belong to, but sometimes we need more. Be it research¬ 
ing a competitive business, potential client, potential partner, or simply the organi¬ 
zation where we applied for an opening, there are platforms which can help us to 
understand them better. Let’s learn about some of them. 

Linked In (httpsS/www. linked in. com/vsearch/c) 

Linkedln is one of the most popular professional social media website. We 
have already discussed about Linkedln search in a previous chapter, but when it 
comes to searching about companies we simply can’t ignore it. Most of the tech 
savvy corporates do have Linkedln profiles. These profiles list some interesting 
information which is usually not found on corporate websites, such as company 
size, their type, and specific industry. It also shows the number of employees 
of the company who have a profile on the platform. We can simply see the list 
of these employees and check their profiles, depending upon who/what we are 
looking for. Apart from this we can also see regular updates from the company 
on their profile page and understand what they are onto. It also allows us to fol¬ 
low companies using a registered account so that we can receive regular updates 
from them. 

Glassdoor ( http J/www.glassdoor ; com/Reviews/index.htm) 

Glassdoor is a great platform for job seekers but it also provides a huge amount 
of relevant information on companies. Apart from the usual information such as 
company location, revenue, competitors, etc. we can also find information such as 
employee review, salary, current opportunities as well as interview experiences. The 
best part is that the information is provided not just by the organization itself but 
also its employees, hence it provides a much clear view of the internal structure and 
working. Similar to Linkedln, Glassdoor also provides an option to follow company 
profiles to receive updates. 


Copyrighted material 



60 CHAPTER 4 Search the Web—Beyond Convention 



FIGURE 4.5 

Glassdoor company search interface. 

Zoominfo (http://www.zoominfo.com/) 

Zoominfo is a busincss-to-busincss platform which is mainly used by sales and mar¬ 
keting representatives to find details about companies as well as people working in 
them, such as e-mail, phone number, address, relationships, etc. Though the free 
account has various limitations yet it’s a great tool to find information about organi¬ 
zations and their employees. 

REVERSE USERNAME/E-MAIL SEARCH 

Now as we learned how to extract information related to people and companies, let’s 
take this a step further and see what other information we can extract using the user- 
name of a person, which in most cases is the e-mail address of the person. 

EmailSherlock (http://www.emailsherlock.com/) 

EmailSherlock is a reverse e-mail search engine. What it does is that once we provide 
an e-mail address to it, it looks up if that e-mail has been used to register an account 
on a wide range of websites, mostly social media and gets us the results in form of 
any information it can extract from these platforms. This kind of information can 
be very helpful in case we just have the e-mail address of the person of interest. Once we 
know the platform on which this particular person is registered, we can go ahead 
and create an account on it and might be able to extract information which we were 
not allowed to access otherwise. Similar to EmailSherlock there is another service 


Copyrighted material 






























PA61 


Introduction 


called as UserSherlock (http://www.usersherlock.com/) which does the same thing 
for usernames. 

Though the results provided by these services arc not 100% accurate, yet they 
provide a good place to start. 



S' C Q www.emailsherlock.com & = 


Apps Q People □ Company Q Social Media Q News Q Competitive Intellig... f^l Search Engines f~l Chrome Addons 

SfTweet 24 Q^^{k4k Q 1 Patous fan 5 |$ Buffer| 4 ©Pont S Email Share 

16 

M 

EMAIL^ 

SHERLOCK 


7,608,853 searches so far 


GO 


Recent searches: anabeim_l2@ivecom dcorrall994@yahoocom pauidooQiaskey@gmaii com Wackrain23rd@gmaii com don hoopteacher@gmali com 

Why I need EmailSherlock's email search? 

A reverse email search conducted al EmaliShertock com can help delermne the identity of the owner of an 
unknown address that shows up in your inbox You can also use this free email search service to leam more about 
an address you found in your address book or perhaps in connection with an online ad you're considering 


FIGURE 4.6 

EmailSherlock interface. 

CheckUsernames (http-J/checkusernames. com/) 

Similar to UserSherlock, CheckUsernames also runs the username provided to it through 
a huge list of social media websites and check if that username is available on them or not. 

Namechk (http-J/namechk.com/) 

Like CheckUsernames and UserSherlock, Namechk also checks the availability of 
the provided username on a huge list of social media sites. 

KnowEm ( http^/kno wem. com/) 

The website discussed above (checkusernames.com) is powered by KnowEm and 
similarly it can be used to check for usernames, but it additionally checks for domain 
names as well as trademark. 

Face book (https://www.facebook.com/) 

Unlike most of the social network sites, Facebook allows us to search people using 
e-mail addresses and being one of the largest social networks it can be very helpful 
when searching people online. 



61 


Copyrighted material 





















62 CHAPTER 4 Search the Web—Beyond Convention 


SEMANTIC SEARCH 

In chapter 2, we discussed about semantic web and how it will be an integral part of 
the web of the future. Let’s get familiar with some of the semantic search engines 
and see how mature they are. 


DuckDuckGo (https://duckduckgo.com) 

Though the name DuckDuckGo may sound a bit odd for a search engine but the 
search results provided by it are quite amazing. This new kid on the block is slowly 
challenging the search giant Google based on its unique selling proposition (USP), i.e., 
it does not track its users. The search results provided by it are very relevant minus the 
clutter. There are not many ads and sidebars to fill up the space. It provides meaning for 
the query which help the user to select the one of his/her intention and get the results 
accordingly. Similar to Google it also provides the answers to mathematical queries and 
even provides answers for queries like weather with the weather for our location. The 
definition tab simply provides the dictionary meaning of the keyword supplied. The bar 
under the query box is very relevant and provides the categories for topics. It is popu¬ 
lated depending upon the search query, such as searching for a music band populates 
it with related videos, whereas searching for Thailand beaches will display images of 
the beaches, it also responds to queries like what rhymes with you with relevant results. 
The rapid growth and incredible features make it a real competition for major search 
engines like Google, Bing, and Yahoo and is slowly gaining the recognition it deserves. 
It is a must try for anyone who is enthusiastic about new ways of exploring the web. 

i i i r 

* @ beatels at DuckDuckGo _ 

<- -J' C S httpsy/duckduckgo.com/?q=beatels&t=chromiumportable = 

::: Apps CD People CD Company CD Social Media □ News C] Competitive Intellig... CD Search Engines Q Chrome Addons 



(The Beaties) '20 Greatest H... The Beateis • I Feel Fine The Beatels • Yesterday Beatels ■ Don’t Let Me Down The Beat els. 

O 6.559.099 views <t> 571.631 views €> 10.981 views <£> 15.863 views <•> 127.784 v 


Did you mean Beatles? 

Shop audio now 

Great prices at Yahoo Shopping. Shop Beatles The today! 

shopping.yahoo.com/audio Sponsored link 

The Beatles - Wikipedia, the free encyclopedia 

The Beatles were an English rock band that formed in Liverpool, in 1960. With John Lennon, Paul 
McCartney. George Harrison and Rlngo Starr, they became widely regarded as the greatest and most 
influential act of the rock era. Rooted in skiffle, beat and 1950s rock and roll, the Beatles later... 

W en.wikipedia.org 


FIGURE 4.7 

DuckDuckGo results. 


Copyrighted material 
















Introduction 63 


Kngine (httpyYkngine.com/) 

Kngine is a great search engine with semantic capabilities. Unlike conventional 
search engines it allows us to ask questions and tries to answer it. We can input que¬ 
ries like “who was the president of Russia between 1990 and 2010” and it presents 
us with a list containing the names, images, term years, and other details related to 
Russia. Similarly searching for “GDP of Italy” gives a great amount of relevant infor¬ 
mation in form of data and graphs minus the website links. So next time a questions 
pops up in our mind we can surely give it a try. 



FIGURE 4.8 

Kngine result for semantic query. 


SOCIAL MEDIA SEARCH 

Social media is a vast platform and its impact is also similar, be it on personal level or 
corporate level. Previously we discussed about social media and also how to search 
through some specific social network platforms, now let’s check out some of the 
social media search engines and their capabilities. 

SocialMention (http://socialmention. com/) 

So what SocialMention provides is basically real-time social media search and anal¬ 
ysis, but what does it mean. Let’s break it up into two parts search and analysis. 
As for the search part, SocialMention searches various social media platforms like 
blogs, microblogs, social networks, events, etc. and even through the comments. The 


Copyrighted material 
















































64 CHAPTER 4 Search the Web—Beyond Convention 


results provided can be sorted by date and source and can be filtered for timelines 
like last hour, day, week, etc., apart from this, SocialMention also provides advanced 
search option, using which we can craft queries to get more precise results. Unlike 
conventional search engines, searching through social media specifically has a huge 
advantage, which is to be able to understand the reach and intensity of terms we are 
searching in the content created by people. Through this we can have a better under¬ 
standing how people relate to these terms and upto what level. 

Now let’s move on to the analysis part, SocialMention not only provides the search 
results for our queries but also indicates the level of sentiments associated with it. It also 
displays the level of strength, passion, and reach of our query terms in the vast ocean 
of social media. Apart from this we can also see the top keywords, users, hashtags, and 
sources related to the query. One of the best features provided by this unique platform 
is that we can not only see this information, but also download it in the form of a CSV 
files. If all this was not sufficient, SocialMention also allows us to setup e-mail alerts for 
specific keywords. The kind of information this platform provides is not only helpful for 
personal use but can also have a huge impact for businesses as well; we can check how 
our brand is performing in the social arena and respond to it accordingly. 



FIGURE 4.9 

SocialMention displaying results and associated statistics. 

Social Searcher (http://www.social-searcher.com/) 

Social Searcher is yet another social media search engine. It uses Facebook, Twitter and 
Google+ as its sources. The interface provided by this search engine is simple. Under 
the search tab the search results are distributed into three tabs based on the source. 


Copyrighted material 











































PA65 


Introduction 


under these tabs the posts arc listed with a preview, which is very helpful in identify¬ 
ing the ones relevant for us. Similar to SocialMention we can setup e-mail alerts also. 

Under the analytics tab we can get the sentiment analysis, users, keywords, 
domains, and much more. One of the interesting of these is the popular tab which 
lists the results with more interaction such as likes, retweets, etc. 

TWITTER 

Twitter is one of the most popular social networking sites with huge impact. Apart 
from its usual functionality to microblog, it also allows to understand the reach and 
user base of any entity which makes it a powerful tool for reconnaissance. Today it is 
widely used for market promotion as well as analyze the social landscape. 

Topsy (http J/topsy. com/) 

Topsy is a tool which allows us to search and monitor Twitter. Using it we can check 
out the trend of any keyword over Twitter and analyze its reach. The interface is 
pretty simple and looks like a conventional search engine, just the results are only 
based on Twitter. The results presented by it can be narrowed down to various time- 
frames such as 1 day, 30 days, etc. We can also filter out the results to only see the 
images, tweets, links, videos, or influencers. There is another filter which allows us to 
see only results containing results from specific languages. All in all Topsy is a great 
tool for market monitoring for specific keywords. 



FIGURE 4.10 

Topsy search. 


65 


Copyrighted material 































66 CHAPTER 4 Search the Web—Beyond Convention 


Trendsmap (httpJ/tr endsmap.com/) 

Trendsmap is a great visual platform which shows trending topics in the form of key¬ 
words, hashtags, and Twitter handles from the Twitter platform over the world map. 
It is great platform which utilizes visual representation of the trends to understand 
what’s hot in a specific region of the world. Apart for showing this visual form of 
information it also allows us to search through this information in the form of a topic 
or a location which makes it easier for us to see only what we want. 

Tweetbeep (http://tweetbeep.com/) 

In its own words, Tweetbeep is like Google alerts for Twitter. It is a great service 
which allows us to monitor topics of interest on Twitter such as a brand name, prod¬ 
uct, or updates related to companies and even links. From market monitoring purpose 
it’s a great tool which can help us to quickly respond to topics of interest. 

Twiangulate (httpS/twiangulate. com/search) 

Twiangulate is a great tool which allows us to perform Twitter triangulations. Using 
it we can find who are the common people who are followers of and are followed 
by two different twitter users. Similarly it also provides the feature to compare the 
reach of two users. It is great tool to understand and compare the influence of differ¬ 
ent Twitter users. 

SOURCE CODE SEARCH 

Most of the search engines we have used only look for the text visible on the web 
page, but there are some search engines which index the source code present on the 
internet. These kind of search engines can be very helpful when we are looking for 
specific technology used over the internet, such as a content management system 
like WordPress. Utilities of such search engines are for search engine optimization, 
competitive analysis, keyword research for marketing and are only limited by the 
creativity of the user. 

Due to the storage and scalability issues earlier there were no service providers in 
this domain, but with technological advancements some options are opening up now, 
let checkout some of these. 

NerdyData (http://nerdydata.com) 

NerdyData is one of the first of its kind and unique search engine which allows us to 
search the code of the web page. Using the platform is pretty simple, go to the URL 
https://search.nerdydata.com/, enter the keyword like WordPress 3.7 and NerdyData 
will list down the websites which contain that keyword in their source code. The 
results not only provide the URL of the website but also shows the section of the 
code with the keyword highlighted under the section Source Code Snippet. Apart 
from this there are various features such as contact author, fetch backlink, and others 
which can be very helpful but most of these are paid, yet the limited free usage of 
NerdyData is very useful and is worth a try. 


Copyrighted material 




PA67 


Introduction 



FIGURE 4.11 

NerdyData code search results. 

Ohloh code (https J/code. ohloh.net) 

Ohloh code is another great search engine for source code searching, but it’s 
a bit different in terms that it searches for open source code. What this means 
is that its source of information is the code residing in open space, such as Git 
repositories. 

It provides great options to filter out the results based on definitions, languages 
(programming), extensions, etc. through a bar on the left-hand side titled “Filter 
Code Results.” 

Se arc he ode (https://searchcode.com) 

Similar to Ohloh, Searchcode also uses open source code repositories as its informa¬ 
tion source. The search filters provided by Searchcode are very helpful, some of them 
are repository, source, and language. 

TECHNOLOGY INFORMATION 

In this special section of search engines we will be working on some unique search 
engines which will help us to gather information related to various different technol¬ 
ogies and much more. In this segment we will be heavily dealing with IP addresses 
and related terms, so it is advised to go through the section “Defining the basic terms” 
in the first chapter. 


67 


Copyrighted material 



































68 CHAPTER 4 Search the Web—Beyond Convention 


Whois (http://whois.net/) 

Whois is basically a service which allows us to get information about the registrant 
of an internet resource such as a domain name. Whois.net provides a platform using 
which we can perform a Whois search for a domain or IP address. A whois record 
usually consists of registrar info; date of registration and expiry; registrant info such 
as name, e-mail address, etc. 

Robtex (http://www.robtex.com) 

Robtex is great tool to find out information about internet resources such as IP 
address. Domain name. Autonomous System (AS) number, etc. The interface is 
pretty simple and straightforward. At the top left-hand corner is a search bar using 
which we can lookup information. Searching for a domain gives us related informa¬ 
tion like IP address, route, AS number, location, etc. Similarly other information is 
provided for IP addresses, route, etc. 

W3dt (https://w3dt.net/) 

W3dt is great online resource to find out networking related information. There arc 
various section which we can explore using this single platform. The first section is 
domain name system (DNS) tools which allows us to perform various DNS-related 
queries such as DNS lookup, reverse DNS lookup, DNS server fingerprinting, etc. 
Second section provides tools related to network/internet such as port scan, tracer- 
oute, MX record retriever, etc. The next section is web/HTTP which consists of tools 
such as SSL certificate info, URL encode/decode, HTTP header retrieval, etc., then 
comes the database lookups section under which comes MAC address lookup, Whois 
lookup, etc., in the end there are some general and ping-related tools. All in all it is 
great set of tools which allows to perform a huge list of different useful functions 
under single interface. 

Shodan (http://www.shodanhq.com/) 

So far we have used various types of search engines which help us to explore the web 
in all different ways. What we haven’t encountered till now is an internet search engine 
(remember the difference between web and internet explained in chapter 1) or simply 
said a computer search engine. Shodan is a computer search engine which scans the 
internet and grabs the service banner based on IP address and port. It allows us to search 
this information using IP addresses, country filters, and much more. Using it we can 
find out simple information such as websites using a specific type of web server such as 
Internet Information Services (IIS) or Apache and also information which can be quite 
sensitive such as IP cameras without authentication or SCADA systems over internet. 

Though the free version without registration provides very limited information, 
which can be mitigated a bit using a registered account, yet it is sufficient enough to 
understand the power of this unique search engine. We can utilize the power of this 
tool through browser add-on or through its application programming interface also. 
Shodan has a very active development history and comes up with new features all the 
time, so we can expect much more from it in the future. 


Copyrighted material 




PA69 


Introduction 


W <3, SHODAN • Computer Sec x ^ ’ 

' 

"SfisTT 2 T 

<- C Q www.shodanhq.com/search?q=+port%3A21 


☆ = 1 

11 i:i Appi CD People CD Company CD Social Media Q] News C] Competitive Intcllig... C] Search Engines CD Chrome Addons 


Shodan Exploits Scanhub Maps Blog Anniversary Promotion 

SHODAN 


Top Countries 

United States 

China 

Germany 

Thailand 

Italy 


5,181.467 

1,975,627 

1,725.496 

1,281.427 

1,157,200 


188.143.91.138 

Dtd Ltd. 

AOSedOft 18,07 2014 


Results 1 • 10 Ot about 29278560 for port:21 


91.206.251.5 

Peter 6. J. Ourieux 

Added on 18.07.20H 

II 

S1-20C-2S1 •S-powered-by.dtgdeuvefi.be 


220PrcFTPD 1J.4»Str.tr (wVjteadipleu-.tabe) 91.206.2? 1.51 
530 Login sownect. 

314-Tb* follovnaj coxnnuaii »e recopuzed (• ■ 

214-CWD XC1VD COUP XCUP SMNT* QUIT PORT PASV 
214.EPRT EPSV ALLO* RNFRRNTO DELE MDTM RMD 
214-XRMD MKD XMKD PVVD XPWD SIZE SYST HELP 
214-NOOP FEAT OPTS AUTH* CCC* CONF* ENC* MIC* 
214-P3SZ* PROT* TYPE 8TKU MODE RETR STOR... 


220 Akamai Content Storage FT? Str.tr 
530 Login incorrect. 

214-Ti>» following comman4i are rtccjnirtd (* »•'» onunplemtnted) 
OVD XCWD COUP XCUP SMNT* QUIT PORT PASV 
EPRT EPSV ALLO* RNFR RNTO DELE MDTM RMD 
XRMD MKD XMKD PUD XPWD SIZE SYST HELP 
NOOP FEAT OPTS AUTH CCC* CONF* ENC* MIC* 

PBSZ PROT TYPE STRU MODE RETR STOR STOV 
APPE REST ABOR USER PASS. 


220 FTP Server (ZyWALL USG 300) [: «F1SS14J.91.13S] 

530 Login incorrect 

re recognized (• * \ uuscplemented): 


Hurricane 

LABS 


Celebrating 3 
years of 
Shodan 





FIGURE 4.12 

Shodan results for port 21. 


WayBack Machine (http://archive.org/web/web.php) 

Internet Archive WayBack Machine is great resource to lookup how a website looked 
in past. Simply type the website address into the search bar and it will return back 
a timeline with the available snapshot highlighted on the calendar. Simply hovering 
over these highlighted dates over calendar will present a link to the snapshot. This is 
great tool to analyze how a website has evolved and thus monitor its past growth. It 
can also be helpful to retrieve information from a website which was available in the 
past but is not now. 


REVERSE IMAGE SEARCH 

We all are familiar with the phrase “A picture is worth a thousand words” and its verac¬ 
ity and are also aware of platforms like Google Images (http://images.google.com), 
Flickr (https://www.flickr.com/), Deviantart (http://www.deviantart.com/), which 
provides us images for keywords provided. Usually when we need to lookup some 
information, we have a keyword or a set of them in the form of text, following the 
same lead the search engines we have dealt with till now take text as an input and 
get us the results, but in case we have an image and we want to see where it appears 
on the web, where do we go? This is where reverse image search engines come in, 
which take image as an input and looks up to find its web appearance. Let’s get 
familiar with some of these. 


69 


Copyrighted material 

























70 CHAPTER 4 Search the Web—Beyond Convention 


Google Images (http://images.google.com/) 

We all are aware that Google allows us to search the web for images, but what many 
of us are unaware of is that it also allows to perform a reverse image search. We 
simply need to go to the URL http://images.google.com and click on the camera icon 
and provide the URL of the image on the web or upload a locally stored image file, 
we can also drag and drop an image file into the search bar and voila Google comes 
up with links to the pages containing that or similar images on the web. 


^ C S https://www.google.com/search?tbs-sbi A^.1^lZZisHUQIGAUWxBSx8MORWv-kcrmPkVnKdgcTafQ0lL_la70QpM3OqpoeDrv , u , 

::: A PP* CD People CD Company Qj Social Media Q New* CD Competitive Intellig — CD Search Engines CD Chrome Addon* 


Google |B mona-iisa.jpg | mona lisa leonardo da vinci 


sa $ 


Web Images News Shopping Maps More * Search tools 


About 1,140 results (1.27 seconds) 

mage si 
>83$ * i 


i 


Image size: 
2835«4289 


Find other sizes of this image: 
All sizes • Medium • Large 


Best guess for this image mona lisa leonardo da vinci 


Mona Lisa - Wikipedia, the free encyclopedia 

en wikipedia org/wiki/Mona_Lisa w 

The Mona Lisa (Monna Lisa or La Gioconda in Italian, La Joconde in French) is a half- 
length portrait of a woman by the Italian artist Leonardo da Vinci, which 

Leonardo's "Mona Lisa" - Smarthistory 

smarthistory.khanacademy.org/leonardo-mona-lisa.hlml w 

Smarthistory conversation about one of art history's most famous paintings. Leonardo da 
Vinci's Mona Lisa 


Visually similar images 


Report images 


Mona Lisa 



Artwork 

The Mona Lisa is a half- 
length portrait of a woman 
by the Italian artist 
Leonardo da Vinci, which 
has been acclaimed as "the 
best known, the most 
visited, the most written about, the most 
sung about, the most parodied work of art 
in the world " Wikipedia 

Artist: Leonardo da Vinci 

Location: The Louvre (since 1797) 

Subject: Lisa del Giocondo 

Created: 1503-1517 

Dimensions: 2' 6" x V ST (77 cm x 53 cm) 

Periods: Italian Renaissance. The 

Renaissance 


Pg~"‘g r/-K 4 nr 


FIGURE 4.13 

Google reverse image search. 


Tin Eye ( https://www. tineye. com/) 

TinEye is another reverse image search engine and has a huge database of images. 
Similar to Google images, searching on TinEye is very simple, we can provide the 
URL to the image, upload it, or perform a drag and drop. TinEye also provides 
browser plugin for major browsers, which makes the task much easier. Though the 
results of TinEye are not as comprehensive as Google images, yet it provides a great 
platform for the task and must be tried. 

I mage Raider (http-J/www. I mage Raider, com/) 

Last but not the least in this list is ImageRaider. ImageRaider simply lists the 
results domain wise. If a domain contains more than one occurrence of the 


Copyrighted material 

















PA71 


Introduction 


image then it also tells that and the links to those images are listed under the 
domain name. 

Reverse image search can be very helpful to find out more about someone when 
we are hitting dead-ends using conventional methods. As many people use same 
profile picture for various different platforms, making a reverse image search can 
lead us to other platforms where the use has created a profile and also has previously 
undiscovered information. 

MISCELLANEOUS 

We dealt with a huge list of search engines which are specialize in their domain and 
are popular among a community. In this section we will be dealing with some differ¬ 
ent types of search platforms which are lesser known but serve unique purposes and 
are very helpful in special cases. 

Data Market (http://datamarket.com/) 

DataMarket is an open portal which consists of large data sets and provides the data 
in a great manner through visualizations. The simple search feature provides results 
for global topics with list of different visualizations related to the topic, for example, 
searching for the keyword gold would provide results such as gold statistics, import/ 
export of gold, and much more. The results page consists of a bar on the left which 
provides a list of filters using which the listed results can be narrowed down. It also 
allows us to upload our own data and create visualization from it. Refer to the link 
http://datamarket.com/topic/list/ for a huge list of topics on which DataMarket pro¬ 
vides information. 

WolframAlpha (http://www. wolframalpha.com/) 

In this chapter we learned about various search engines which take some value as 
input and provide us with the links which might contain the answer to the questions 
we are actually looking for, but what we are going to learn about now is not a search 
engine but a computational knowledge engine. What this means is that it takes our 
queries as input but does not provides with the URLs to the websites containing the 
information, instead it tries to understand our natural language queries and based 
upon an organized data set, provides a factual answer to them in form of text and 
sometimes apposite visualization also. 

Say, for example, we want to know the purpose of .mil domain, so we can sim¬ 
ply type in the query “what is the purpose of the .mil internet domain?” and get 
the results, to get the words starting with a and ending with e, a query like “words 
starting with a and ending with e” would give us the results, we can even check the 
net worth of Warren Buffett by a query like “Warren Buffett net worth.” For more 
examples of the queries of various domains that WolframAlpha is able to answer, 
checkout the page http://www.wolframalpha.com/cxamples/. 


71 


Copyrighted material 




72 CHAPTER 4 Search the Web—Beyond Convention 



FIGURE 4.14 

WolframAlpha result. 

A ddictomatic ( http://addictomatic. com) 

Usually we visit various different platforms to search information related to a topic, 
but addictomatic aggregate various news and media sources to create a single dash¬ 
board for any topic of our interest. The content aggregated is displayed in various 
sections depending upon the source. It also allows us to move these sections depend¬ 
ing upon our preference for better readability. 

Carrot2 ( http -.//search. carrot2. org/stable/search) 

Carrot2 is a search results clustering engine, what this means is that it takes 
search results from other search engines and organizes these results into topics 
using its search results clustering algorithms. Its unique capability to cluster 
the results into topics allows to get a better understanding of it and associ¬ 
ated terms. These clusters are also represented in different interesting forms 
such as folders, circles, and FoamTree. Carrot2 can be used through its web 
interface which can be accessed using the URL http://search.carrot2.org/ 
and also through a software application which can be downloaded from 
http://project.carrot2.org/download.html. 


Copyrighted material 



















Introduction 73 



FIGURE 4.15 

Carrot2 search result cluster. 

Board reader (httpJ/boardreader.com/) 

Boards and forums are rich source of information as a lot of interaction and Q&A goes 
on in places like this. Members of such platforms range from newbies to experts in the 
domain to which the forum is related to. In places like this we can get answers to questions 
which arc difficult to find elsewhere as they purely comprise of user-generated content, 
but how do we search them? Here is the answer Boardreader. It allows us to search 
forums to get results which contains content with human interaction. It also displays a 
trend graph of the search query keyword to show the amount of activity related to it. The 
advance search features provided by it such as sort by relevance, occurrence between 
specific dates, domain-specific search, etc. adds to its already incredible features. 

Omgili (http-J/omgUi. com/) 

Similar to Boardreader, Omgili is also a forum and boards search engine. It displays 
the results in the form of broad bars and these bars contain information such as date, 
number of posts, author, etc. which can be helpful in estimating the relevance of the 
result. One such information is Thread Info, which provides further information about 
a thread such as forum name, number of authors, and replies to the thread, without 
actually visiting the original thread forum page. It also allows us to filter the results 
based upon the timeline of their occurrence such as past month, week, day, etc. 


Copyrighted material 




































74 CHAPTER 4 Search the Web—Beyond Convention 


True cal ter (http://www. true caller com) 

Almost everyone who uses or has ever used a smartphone is familiar with the concept 
of mobile applications, better known as apps and many if not most of them have used 
the famous app called Truecaller which helps to identify the person behind the phone 
number, what many of us are unaware of is that it can also be used through a web 
browser. Truecaller simply allows us to search using a phone number and provides 
the user’s details using it’s crowdsourced database. 


Other search engines worth trying: 

• Meta search engine 

• Search (http://ww w.search.com/) 

• People search 

• ZabaSearch (http://ww'w.zabasearch.com/) 

• Company search 

• Hoovers (http://ww w.hoovers.com/) 

• Kompass (http://kompass.com/) 

• Semantic 

• Sensebot (http://w w w.sensebot.net/) 

• Social media search 

• Whostalkin (http://www.whostalkin.com/) 

• Twitter search 

• Mcntionmapp (http://mcntionmapp.com/) 

• SocialCollider (http://socialcollider.net/) 

• GeoChirp (http://www.geochirp.com/) 

• Twittcrfall (http://bcta.twittcrfall.com/) 

• Source code search 

• Meanpath (https://meanpath.com) 

• Technology search 

• Nctcraft (http://www.netcraft.com/) 

• Scrvcrsniff (http://scrvcrsniff.net) 

• Reverse image search 

• NcrdvData image search (https://search.nerdydata.com/images) 

• Miscellaneous 

• Freebase (http://www.freebase.corn/) 


So we discussed a huge list of various search engines under various categories 
which arc not conventionally used but as we have already seen these are very useful 
in different scenarios. We all are addicted to Google for all our searching needs and it 
being one of the best in its domain has also served our purpose most of the time, but 
sometimes we need different and specific answers to our queries, then we need these 
kind of search engines. This list tries to cover most of the aspects of daily searching 
needs, yet surely there must be other platforms which need to be find out and used 
commonly to solve specific problems. 

In this chapter we learned about various unconventional search engines, their 
features, and functionalities, but what about the conventional search engines like 
Google, Bing, Yahoo, etc. that we use on daily basis. Oh! we already know how to 


Copyrighted material 






Introduction 


75 


use them or do we? The search engines we use on daily basis have various advanced 
features which many of the users are unaware of. These features allows users to filter 
out the results so that we can get more information and less noise. In the next chapter 
we will be dealing with conventional search engines and will learn how to use them 
effectively to perform better search and get specific results. 


Copyrighted material 


PA76 


This page intentionally left blank 


Copyrighted material 



Advanced Web Searching 


5 


INFORMATION IN THIS CHAPTER 

• Search Engines 

• Conventional Search Engines 

• Advanced Search Operators of various Search Engines 

• Examples and Usage 


INTRODUCTION 

In the last chapter we dealt with some special platforms which allowed us to per¬ 
form domain-specific searches; now let’s go into the depths of conventional search 
engines which we use on daily basis and check out how we can utilize them more 
efficiently. In this chapter, basically, we will understand the working and advanced 
search features of some of the well-known search engines and see what all function¬ 
alities and filters they provide to serve us better. 

So we already have a basic idea about what search engine is, how it crawls over 
the web to collect information, which are further indexed to provide us with search 
results. Let’s revise it once and understand it in more depth. 

Web pages as we see them are not actually what they look like. Web pages basi¬ 
cally contain HyperText Markup Language (HTML) code and most of the times 
some JavaScript and other scripting languages. So HTML is basically a markup lan¬ 
guage and uses tags to structure the information, for example the tag <hlx/hl> 
is used to create a heading. When we receive this HTML code from the server, our 
browsers interpret this code and display us the web page in its rendered form. To 
check the client-side source code of a web page, simply press Ctrl+U in the browser 
with a web page open. 

Once the web crawler of a search engine reaches a web page, it goes through 
its HTML code. Now most of the times these pages also contain links to other 
pages, which are used by the crawlers to move further in their quest to collect 
data. The content crawled by the web crawler is then stored and indexed by 
search engine based on variety of factors. The pages are ranked based upon their 
structure (as defined in HTML), the keywords used, interlinking of the pages, 
media present on the page, and many other details. Once a page has been crawled 
and indexed it is ready to be presented to the user of the search engine depending 
upon the query. 


Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.00005-7 
Copyright © 2015 Elsevier Inc. All rights reserved. 






78 CHAPTER 5 Advanced Web Searching 


Once a page has been crawled, the job of the crawler does not finish for that page. 
The crawler is scheduled to perfonn the complete process again after a specific time 
as the content of the page might change. So this process keeps on going and as new 
pages are linked they are also crawled and indexed. 

Search engine is a huge industry in itself which helps us in our web exploration, 
but there is another industry which depends directly on search engines and that is 
search engine optimization (SEO). SEO is basically about increasing the rank of 
a website/web page or in other words to bring it up to the starting result pages of a 
search engine. The motivation behind this is that it will increase the visibility of that 
page/site and hence will get more traffic which can be helpful from a commercial or 
personal point of view. 

Now we have a good understanding of the search engines and how they operate, 
let’s move ahead and see how we can better use some of the conventional search 
engines. 


GOOGLE 

Google is one of the most widely used search engines and is the starting point for 
web exploration for most of us. Initially Google search was accessible through very 
simple interface and provided limited information. Apart from the search box there 
were some special search links, links about the company, and a subscription box 
where we could enter our email to get updates. There were no ads, no different lan¬ 
guage options, no login, etc. 

It’s not only the look and feel of the interface that has changed over the years but 
also the functionalities. It has evolved from providing simple web links to the pages 
containing relevant information to a whole bunch of related tools which not only 
allow us to search different media types and categories but also narrow down these 
results using various filters. Today there are various categories of search results such 
as images, news, maps, videos, and much more. These plethora of functionalities 
provided by Google today has certainly made our lives much easier and made the act 
of finding information on the web a piece of cake. Still sometimes we face difficulty 
in finding the exact information we are looking for and the main reason behind it is 
not the lack of information but to the contrary the abundance of it. 

Let’s move on to see how we perform Google search and how to improve it. So 
whenever we need to search something in Google we simply think about some of the 
keywords associated with it and type them into the search bar and hit Enter. Based 
upon the indexing Google simply provides us with the associated resources. Now if 
we want to get better results or filter the existing results based upon various factors, 
we need to use Google advanced search operators. Let’s have a look at these opera¬ 
tors and their usage. 

site: 

It fetches results only for the site provided. It is very useful when to limit our 
search to some specific domain. It can be used with another keyword and Google 


Copyrighted material 


PA79 


Google 


will bring back related pages from the site specified. For an information security 
perspective it is very useful to find out different sub domains related to a particular 
domain. 

Examples: site:gov, site:house.gov 



FIGURE 5.1 

Google “site” operator usage. 

inurl: 

This operator allows looking for keywords in the uniform resource locator (URL) of 
the site. It is useful to find out pages which follow a usual keyword for specific pages, 
such as contact us. Generally, as the URL contains some keywords associated with 
the body contents, it will help us to find out the equivalent page for the keyword we 
are searching for. 

Example: inurkhack 

allinurl: 

Similar to “inurl” this operator allows looking for multiple keywords in the URL. So 
we can search for multiple keywords in the URL of a page. This also enhances the 
chances of getting quality content of what we are looking for. 

Example: allinurl:hack security 

intext: 

This operator makes sure that the keyword specified is present in the text of the page. 
Sometimes just for the sake of SEO, we can find some pages only contain keywords 
to enhance the page rank but not the associated content. In that case we can use this 


79 


Copyrighted material 













80 CHAPTER 5 Advanced Web Searching 


query parameter to get the appropriate content from a page for the keyword we are 
looking for. 

Example: intextihack 

all intext: 

Similar to the “intext” this operator allows to lookup for multiple keywords in the 
text. As we discussed earlier the feature of searching for multiple keywords always 
enhances the content quality in the result page. 

Example: allintext:data marketing 

intitle: 

It allows us to restrict the results by the keywords present in the title of the pages 
(title tag: <titlc>XYZ</title>). It can be helpful to identify pages which follow a con¬ 
vention for the title of the pages such as directory listing by the keywords “index of” 
and most of the sites provide the keywords in the title for improving the page rank. 
So this query parameter always helps to search for a particular keyword. 

Example: intitle:blueoccan 

allintitle: 

This is the multiple keyword counterpart of “intitle” operator. 

Example: allintitle:blueocean market 

filetype: 

This operator is used to find out files of a specific kind. It supports multiple file types 
such as pdf, swf, km I, doc, svg, txt, etc. This operator comes handy when we are only 
looking for specific type of files on a specific domain. 

Example: filetype:pdf, site:xyz.com, filetype:doc 

ext: 

The operator ext simply stands for extension and it works similar to the filetype 
operator. 

Example: ext:pdf 

define: 

This operator is used to find out the meaning of the keyword supplied. Google returns 
dictionary meaning and synonyms for the keyword. 

Example: define:data 

AROUND 

This operator is helpful when we are looking for the results which contain two 
different keywords, but in close association. It allows us to restrict the number 


Copyrighted material 


Google 81 


of words as the maximum distance between two different keywords in the search 
results. 

Example: A AROUND(6) Z 

AND 

A simple Boolean operator which makes sure keywords on both the side are present 
in the search results. 

Example: data AND market 


OR 

Another Boolean operator which provides search results that contain either of the 
keyword present on both the sides of the operator. 

Example: data OR intelligence 

NOT 

Yet another Boolean operator which excludes the search results that contain the key¬ 
word followed by it. 

Example: lotus NOT flower 


tnt 

This operator is useful when we need to search for the results which contain the 
provided keyword in the exact sequence. For example we can search pages which 
contain quotes or some lyrics. 

Example: “time is precious” 


This operator excludes the search results which contain the keyword followed by it 
(no space). 

Example: lotus -flower 


* 

This wildcard operator is used as a generic placeholder for the unknown term. 
We can use this to get quotes which we partially remember or to check variants 
of one. 

Example: “* is precious” 


This special operator is used to provide a number range. It is quite useful to enforce 
a price range, time range (date), etc. 

Example: japan volcano 1990..2000 


Copyrighted material 


82 CHAPTER 5 Advanced Web Searching 


info: 

The info operator provides information what Google has on a specific domain. Links 
to different types of information are present in the results, such as cache, similar 
websites, etc. 

Example: info:elsevier.com 

related: 

This operator is used to find out other web pages similar to the provided domain. It 
is very helpful when we are looking for websites which provide similar services to a 
website or to find the competitors of it. 

Example: related:elsevier.com 

cache: 

This operator redirects to the latest cache of the page that Google has crawled. In case we 
don’t get a result for a website which was accessible earlier, this is a good option to try. 

Example: cache:elsevier.com 

Advanced Google search can also be performed using the page 
http://www.google.com/advanced_search, which allows us to perform restricted 
search without using the operators mentioned above. 


Advanced Search 

Find pages vwlh... 
ail th«s« words 
this exact word or phrase 
arty of these words: 
none of these words: 
numbers ranging from: 


Then narrow your resutts 

by... 


last update 
site or domain: 
terms appearing 


FIGURE 5.2 


To do this in the search box 
J Type ere rrpwlsflt wonts: tricolor tee terrier 
Put exact wires n qu:tes. “ret terrier* 

Type 00 between li lie words you wort sisietuse 




• F*« ?»;«»* the i*i$u»$«y«UMMCt 

* Fatf p»j«* eubtohrt n « pettKwhr rtpion 

*■ Fmj peget epdited wttwi the tne you specify 


Google advanced search page. 

Apart from the operators Google also provide some operations which allow us to 
check information about current events and also perform some other useful things. 
Some examples are: 


Copyrighted material 



















Google 83 


time 

Simply entering this keyword displays the current time of the location we are resid¬ 
ing in. We can also use name of region to get its current time. 

Example: time france 

weather 

This keyword shows the current weather condition of our current location. Similar to 
“time” keyword we can also use it to get the weather conditions of a different region. 

Example: weather Sweden 

Calculator 

Google also solves mathematical equations and also provides a calculator. 

Example: 39*(9823-312)+44/3 

Convertor 

Google can be used to perform conversions for different types of units like measure¬ 
ment units, currency, time, etc. 

Example: 6 feet in meters 

This is not all, sometimes Google also shows relevant information related to 
global events as and when they happen; for example, FIFA World Cup. 

Apart from searching the web, in general, Google also allows us to search spe¬ 
cific categories such as images, news, videos, etc. All these categories, including web 
have some common and some specific search filters of their own. These options can 
simply be accessed by clicking on the “Search tools” tab just below the search bar. 
We can find options which allow us to restrict the results based upon the country, 
time of publish for web; for images there are options like the color of image, its type, 
usage rights, etc. and similarly other relevant filters for different categories. These 
options can be very helpful in finding the required information of a category as they 
are designed according to that specific category. For example if we are looking for an 
old photograph of something it is a good idea to see only the results which arc black 
and white. 

The operators we discussed are certainly very useful for anyone who needs to find out 
some information on the web, but the InfoSec community has certainly taken it to next 
level. These simple and innocent operators we just discussed are widely used in the cyber 
security industry to find and demonstrate how without even touching the target system, 
critical and compromising information can be retrieved. This technique of using Google 
search engine operators to find such information is termed as “Google Hacking.” 

When it comes to “Google Hacking” one name that jumps out in mind is Johnny 
Long. Johnny was an early adopter and pioneer in the field of creating such Google 
queries which could provide sensitive information related to the target. These queries 
are widely popular by the name Google Dorks. 

Let’s understand how this technique works. We saw a number of operators which 
can narrow down search results to a specific domain, filetype, title value, etc. Now 


Copyrighted material 


84 CHAPTER 5 Advanced Web Searching 


in Google Hacking our motive is to find sensitive information related to the target; 
for this people have come up with various different signatures for different files and 
pages which are known to contain such information. For example, let’s just say we 
know the name of a sensitive directory which should not be directly accessible to 
any user publicly, but remains public by default after the installation of the related 
application. So now if we want to find out the sites which have not changed the 
accessibility for this directory, we can simply use the query “inurl:/sensitive_direc- 
tory_name/” and we will get a bunch of websites which haven’t changed the setting. 
Now if we want to further narrow it down for a specific website, we can combine 
the query with the operator “site,” as “site:targetdomain.com inurl://sensitive_direc- 
tory_name/.” Similarly we can find out sensitive files that are existing on a website 
by using the operators “site” and “filetype” in collaboration. 

Let’s take another example of Google Hacking which can help us to discover high 
severity vulnerability in a website. Many developers use flash to make websites more 
interactive and visually appealing. Small web format (SWF) is a flash file format used 
to create such multimedia. Now there are many SWF players known to be vulnerable to 
cross-site scripting (XSS), which could lead to an account compromise. Now if we want 
to find out if the target domain is vulnerable to such attack, then we can simply put in 
the query “site:targetdomain.com filetype:swf SWFPlayer_signature_keyword” and test 
the resulting pages using publicly available payloads to verify. There are huge number 
of signatures to find out various types of pages such as sensitive directories, web server 
identification, files containing username/password, admin login pages, and much more. 

The Google Hacking Database created by Johnny Long can be found at 
http://www.hackersforcharity.org/ghdb/ though it is not updated, yet it is a great place 
to understand and learn how we can use Google to find out sensitive information. A 
regularly updated version can be found at http://www.exploit-db.com/google-dorks/. 


□ X 



ACKING- ATA ASIE 

Welcome to the google hacking database 


We call them ’googledorks': Inept or foolish people as revealed by Google. Whatever you call these 
fools, you’ve found the center of the Google Hacking Unlversel 



Latest Google Hacking Entries 

Date Title 

W14O6-09 c+tir. Vd»x- fc loon to 81 ... 

301+07-29 

301407-21 nuril'ehy.hCm’fltiticTcychMeo*.-. 

2014-07-04 aetype!s0 ate:gev and Insert Mo* 

201407-03 S4c5copc injil:^5te5c£Oo l l col.foo.cw, l SllcSccpc?EU. 
201406-12 nMl:‘/mrm/netmork-‘Jhtn/'aRn£t&... 
20140603 nOMriT»ub»e!P-|T».- 
2014OS-19 nufcdttiedtkjsp 
20140106 nde«t;'Hkvnon'njr(:Toj>n.mp... 

201405-06 iwl;'AMbfc.f>*vtenKC°ft«<' 


Category 

ecnUMng tog* po<Uft 
Fiat eenlarino jucy rtfo 
Various cmr* 0«v>c«l 

HescenttMngXtcyinfo 

Various OOrc Devices 
network or ti*ierab*ty dot. 
Wes ecrvwmg jucy rrfc 
Various Onftne Oewces 
Various Cntne Oewces 


FIGURE 5.3 

Google hacking database-www.exploit-db.com/google-dorks/. 


Copyrighted material 








PA85 


Bing 


BING 

Microsoft has been providing search engine solutions from a long time and they 
have been known with different names. Bing is latest and most feature-rich search 
engine in this series. Unlike its predecessors Bing provides a more clean and simple 
interface. As Microsoft covers a major part of operating system market, the general 
perspective of a user in terms of search engine is that Bing is just another side- 
product from a technology giant and hence most of them do not take it seriously. But 
unfortunately it is wrong. Like all the search engines Bing also has some unique fea¬ 
tures that will force you to use Bing when you need those features. Definitely those 
features have a unique mark on how we search. We will discuss not only about the 
special features but also the general operators which can allow us to understand the 
search engine and its functionalities. 

+ 

This operator works quite similar in all the search engines. This allows a user to 
forcefully add single or multiple keywords in a search query. Bing will make sure the 
keywords come after + operator must present in the result pages. 

Example: power +search 


This operator is also known as NOT operator. This is used to exclude something from 
a set of things, such as excluding a cuisine. 

Example: Italian food -pizza 

Here Bing will display all the Italian foods available but not pizza. We can write 
this in another form which can also fetch same result such as the below example 

Example: Italian food NOT pizza 


arr 

This is also same in most of the search engines. This is used to search for exact phrase 
used inside double quotation. 

Example: “How to do Power Searching?” 


I 

This is also known as OR operator, mostly used for getting result from one of the two 
keywords or one of the many keywords added with this operator. 


Example: ios I android 

ios OR android 


85 


Copyrighted material 


86 CHAPTER 5 Advanced Web Searching 


& 

This operator is also known as AND operator. This is the by-default used search 
operator. If we do nothing and add multiple keywords then Bing will do a AND 
search in the backend and give us the result. 


Example: power AND search 
power & search 

As this is the default search, it’s very important to keep in mind that until and 
unless we use OR and NOT in capital, Bing won’t understand it as operators. 

o 

This can be called as group operator. 


Grouping of Bing operators supported in following order. 
0 

NOT/- 

And/& 

OR/1 


As parenthesis has the top priority order, we can add the lower preferred operators 
such as OR in that and create a group query to execute the lower priority operators 
first. 

Example: android phone AND (nexus OR xperia) 

site: 

This operator will help to search a particular keyword within a specific website. This 
operator works quite the same in most of the search engines. 

Example: site:owasp.org clickjacking 

filetype: 

This allows a user to search for data in specific type of file. Bing supports all file 
types but few, mostly those are supported by Google are also supported by Bing. 

Example: hack filetype:pdf 


ip: 

This unique operator provided by Bing allows us to search web pages based upon 
IP address. Using it we can perform a reverse IP search, which means it allows us to 
look for pages hosted on the specified IP). 


Example: ip: 176.65.66.66 


Copyrighted material 



PA87 


Bing 


QipdKASMjtt&Ag 


tJ Kttpsy/wwwJbinacom/s 


reb&pq=ip9 63A176.65. ft 


WtB IMAGES VIDEOS MAPS NEWS MORE 


^ ^ 4 . 

&gn m i 0 


l> bing | ip:176.65.66.6< 


P 


10 RESULTS Narrow by language » Narrow by region - 

Juactiojiboxes in MS ft SS - Boxes in fRP fiLPie Cast AL also, M 

WWW harul-ndia com 

Can custorrwe IPS5 66.66. 67 rated 


eHewer oomr>a= 15 l 713 * 


herts.ac.uk 

herts. ac.uk/no_results_fouRd 
heits ac uk 

herts.ac.uk 

baits ac uk/b*yfordbuiy/40i 
belts .ac.uk 

beds.ac.uk 

beds ac.tW_da!a/assets/powerpoirYi doc/D004/2J8126/Nexi-ste|>s tor * 

beds act* 

beds.ac.uk 

beds ac,uVj'_data/ass«ts/posverpo«Tt_doc/0O05/238127/Suppcrtirg. - 

beds.ac.uk 

ife'gfRetumee^SlgjslS^t^t • ... Translate this pay* 

www mitbbs convbbsdoc1/Retumee_9587_0 html » 


FIGURE 5.4 

Bing "ip” search. 


feed: 

Yet another unique operator provided by Bing is feed, which allows us to look for 
web feed pages containing the provided keyword. 

One other feature that Bing provides is to perform social search using the page 
https://www.bing.com/explore/social. It allows us to connect our social network 
accounts with Bing and perform search within them. 


| £t https://www.bing.com, •_■> olorc/vuciol 

= 

VKfl IMAGES VIDEOS IIEV/S MAPS EXPLORE MORE 


bing 


Connecting Facebook 
to Bing 

Tlw Bing social sidebar Ms you tap into toe 
wisdom or tie rids and everts across mapr 
social networks Ike Facebook. Twitter, 
foursquare. Ouora. Ktout, Google* and 



What is the Bing sidebar? 

The Bing sidebar lets you tap into the wisdom of trends and 
experts across major social networks like Facebook. Twitter, 
foursquare Ouora, (Ooul, Google* and more Just connect to 
Facebook via Bing ana start seeing relevant posts, photos, 
tweets, tips and more, aa at a glance 


Sign in - 0 


FIGURE 5.5 

Bing social search. 


87 


Copyrighted material 




































88 CHAPTER 5 Advanced Web Searching 


YAHOO 

Yahoo is one of the oldest players in the search engine arena and has been quite popu¬ 
lar. The search page for Yahoo also has a lot of content such as news, trending topics, 
weather, financial information, and much more. Earlier Yahoo has utilized third party 
services to power its search capabilities, later it shifted to become independent and 
once again has joined forces with Bing for its searching services. Though there is 
not too much that Yahoo offers in terms of advanced searching as compared to other 
search engines, the ones provided are worth trying comparing to others. Let’s see 
some of the operators that can be useful. 

+ 

This operator is used to make sure the search results contain the keyword followed by it. 
Example: +data 


Opposite to the “+” operator, this operator is used to exclude any specific keyword 
from the search results. 

Example: -info 


OR 

This operator allows us to get results for either of the keywords supplied. 

Example: data OR info 

site: 

This operator allows restricting the result only to the site provided. We will only get 
to sec the links from the specified website. There arc two other operators which work 
like this operator but do not provide results as accurate or in-depth as they are domain 
and hostname. Their usage is similar to the “site” operator. 

Example: site:elsevier.com 

link: 

It is another interesting operator which allows us to lookup web pages which link to 
the specific web page provided. While using this operator do keep in mind to provide 
the URL with the protocol (http:// or https://). 


Copyrighted material 





90 CHAPTER 5 Advanced Web Searching 



Sltow results with 


Sorted By |rX»vwKt *| 


« lest 30 dm * 1 

Published between | jut » | |t2 »| ■«* |Aug *[ [11*1 

TipiYou can »«*rch with.-i • c*rt»m tima pared or apaafr your own data or ranpa of dataa 


Source 


□ 


Tipi You can aaarcK for nawa from a fpacific piondti. a ® “Naw Vortc Tunas*. 


Categories Search only for pages within; 

lh - ‘ 11 ■'••• r .irir'' 1 nai-.v mi vmi. cv 


FIGURE 5.7 

Yahoo advanced search page. 


YANDEX: 

Yandex is Russian search engine and is not too much popular outside the country, 
but it’s one of the most powerful search engines available. Like Google, Bing, Yahoo 
it has its own unique keywords and data indexed. Yandex is the most popular and 
widely used search engine in Russia. It’s the fourth largest search engine in the world. 
Apart from Russia, it is also used in countries like Ukraine, Kazakhstan, Turkey, and 
Belarus, it is also most under rated search engine as its use is only limited to specific 
country hut in security community we see it otherwise. Most of the people are either 
happy with their conventional search engine or they think all the internet information 
is available in the search engine they are using. But the fact is that search engines like 
Yandex also have many unique features that can provide us with way efficient result 
as compared to other search engines. 

Here we will discuss how Yandex can be a game changer in searching data on 
internet and how to use it efficiently. 

As discussed earlier like other search engines, Yandex has its own operators such 
as lang, parenthesis, Boolean, and all. Let’s get familiar with these operators and 
their usage. 

+ 

This operator works quite same for all the search engines. Here also for Yandex, + 
operator is used to include a keyword in a search result page. The keyword added 
after + operator is the primary keyword in the search query. The result fetched by the 
search engine must contain that keyword. 


Copyrighted material 









































92 CHAPTER 5 Advanced Web Searching 


Example: power /4 searching 

Yandex will make sure that the result page must contain these two keywords with 
in four words from each other irrespective of keyword position. That means the order 
in which we created the query with the keywords might change in result page. 

What if we need to fix the order? Yes, Yandex has a solution for that also: adding 
a +sign with the number. 

Example: power /+4 searching 

By adding the + operator before the number will force Yandex to respond with the 
results with only pages where these two keywords are in same order and in within 4 
word count. 

What if we need the reverse of it, let’s say we need to get results of keyword 
“searching” first and after that “power” within 4 word count and not vice versa. In 
that case negative number will come pretty handy where we can use - sign to reverse 
what we just did without getting the vice versa result. 

Example: power /-4 searching 

This will only display pages which contain searching keyword and power after 
that within 4 word count. 

Let’s say we want to setup a radius or boundary for a keyword with respect to 
another; in that case we have to specify that keyword in second position. 

Example: power /(-3 +4) searching 

Here we are setting up a radius for searching with respect to power. This means 
that the page is displayed in results shown only if either “searching” will be found 
within 3 words before or after “power” within 4 word count. 

This can be helpful when we are searching for two people’s names. In that case 
we cannot guess that which name will come first and which name will come next 
so it’s better to create a radius for those two names, and the query will serve our 
purpose. 

As we discussed a lot about word-based keyword search, now let’s put some light 
on sentence-based keyword search. For sentence based keyword search we can use 
Yandex && operator with this number operator. 

Example: power && /4 searching 

In this case we can get result pages containing these two keywords with in 4 
sentence difference irrespective of the position of the keyword. That means either 
“power” may come first and “searching” after that or vice versa. 


/ 

This operator does something special. And this is one of my favorite keyword. It 
gives a user freedom to only search a specific keyword without similar word search 
or extended search and all. What exactly happens in general search is that if you 


Copyrighted material 


Yandex 93 


search for a keyword, let’s say AND, you will get some results showing only AND 
and then the results will extend to ANDroid or AMD and so on. If we want to get only 
result for AND keyword; use this operator. 

Example: land 

This will restrict the search engine to provide results only showing pages which 
contains this particular keyword AND. 

// 

It can be used to search the dictionary form of the keyword. 

Example: Hand 


o 

When we want to create a complex query with different keywords and operators we 
can use these brackets to group them. As we already used these brackets above, now 
we will see some other example to understand the true power of this. 



o 

Web 

U 


CA 

Tracts Isle 


Power Searching with Google 
powerseorchingvuthgoogle com * 

Urtks to the page contan Power searching lessons with Google 


Power Search • Executive Management Portal 

power-search com * 

Power Search is your springboard, the place to find the resources you need to imprme 
your management style, mothate ycor employees 


Power Search E 

(ueleconomy go< * (empower Search j$p» 

You are here Find a Car Home > Power Search Power Search Expand any feature 
by selecting its tale bar. 


Math Forum - Math Library Power Search 
mathforum org > library-search html * 

Search the Library Power Search 


Power Search by Inspyder • Search and Scrape Any Website w 

inspyder com > prcducts/PowerSearchrOefaull aspx * 

Power Search enables you to quickly search a website fer content that is not normaly 
indexed by search engines 


Power Search Tool - Search The Web 


FIGURE 5.8 

Yandex complex query. 

Example: power && (+searching I !search) 

Here the query will search for both sets of keywords first power searching and 
power search but not both in same result. 


n?r 

Now it’s about a keyword let’s say we want to search a particular string or set of 
keywords then what to do? Here this operator comes for rescue. It is quite similar 


Copyrighted material 














94 CHAPTER 5 Advanced Web Searching 


as Google’s “”. This will allow a user to search for exact keywords or string which is 
put inside the double quotes. 

Example: “What is OSINT?” 

It will search for exact string and if available will give us the result accordingly. 


* 

This operator can be refereed as wildcard operator. The use of this operator is quite 
same in most of the search engines. This operator is used to fill the missing keyword 
or suggest relevant keywords according to the other keywords used in the search 
query. 

Example: osint is * of technology 

It will search for auto fill the space where * is used to complete the query with 
relevant keywords. In this case that can be ocean or treasure or anything. We can also 
use this operator with double quote to get more efficient and accurate result. 

Example: “OSINT is * of technology” 


I 

This is also quite similar to OR operator of Google. It allows us to go for different 
keywords where we want results for any of them. In-real time scenario we can search 
for options using this operator. Let’s say I want to buy a laptop and I have different 
options: in that case this operator will come to picture. 

Example: dell I toshiba I macbook 

Here we can get result for any of these three options but not all in one result. 


« 

This is an unusual operator known as non-ranking “AND.” It is basically used to 
add additional keywords to the list of keywords without impacting the ranking of 
the website on result. We might not get to know what exactly it does by just going 
through its definitions. So in simple words it can be used to tag additional keywords 
to the query list without impacting the page rankings. 

Example: power searching « OSINT 

It can be used to additionally search for OSINT along with the other two key¬ 
words without impacting the page ranking in the result page. 

title: 

This is quite equivalent to the “intitle.” It can be used to search the pages with the 
keyword (s) specified after title query parameter. 

Example: title:osint 


Copyrighted material 


Yandex 95 


This will provide pages that contain OSINT in the title of the web page. Similarly 
we can use this title query parameter to search for more than one keyword. 

Example: title:(power searching) 

url: 

This “url” search query parameter is also an add-on. It searches for the exact URL 
provided by the user in Yandex database. 

Example: url:http://attacker.in 

Here Yandex will provide a result if and only if the URL has been crawled and 
indexed in its database. 

inurl: 

It can be used to search for keywords present in a URL or in other words for URL 
fragment search. This “inurl” query parameter works quite similar in all the search 
engines. 

Example: inurkosint 

it will search for all the URLs that contain osint keyword no matter what the posi¬ 
tion of the keyword is. 

mimeifiletype 

This query parameter is quite similar to “filetype” query parameter of Google. This 
helps a user to search for a particular file type. 

Example: osint mime:pdf 



FIGURE 5.9 

Yandex file search. 


Copyrighted material 














96 CHAPTER 5 Advanced Web Searching 


It will provide us all the PDF links that contains osint keyword. The file types 
supported by Yandex mime are 

PDF, RTF, SWF, DOC, XLS, PPT, DOCX, PPTX, XLSX, ODT, ODS, ODP, 
ODG 

host: 

It can be used to search all the available hosts. This can be used by the penetration 
testers mostly. 

Example: hostiowasp.org 

rhost: 

It is quite similar to host but “rhost” searches for reverse hosts. This can also be used 
by the penetration testers to get all the reverse host details. 

It can be used in two ways. One is for subdomains by using the wildcard operator 
* at the end or another without that. 


Example: rhost:org.owasp.* 

rhost:org.owasp.www 


site: 

This operator is like the best friend of a penetration tester or hacker. This is avail¬ 
able in most of the search engines. It provides all the details of subdomains of the 
provided URL. 

For penetration testers or hackers finding the right place to search for vulner¬ 
ability is most important. As in most cases the main sites are much secured as 
compared to the subdomains, if any operator helps to simplify the process by 
providing details of the subdomains to any hacker or penetration tester then half 
work is done. So the importance of this operator is definitely felt in security 
industry. 

Example: site: http://www.owasp.org 

It will provide all the available subdomains of the domain owasp.com as well as 
all the pages. 

date: 

This query can be used to cither limit the search data to a specific date or to specific 
period by a little enhancement in the query. 

Example: datc:201408* 

In this case, format of date used is YYYYMMDD, but in case of the DD we used 
wildcard operator so we will get results limited to August 2014. 


Copyrighted material 




Yandex 97 


We can also limit the same to a particular date of the August 2014 by changing a 
bit in the query. 

date:20140808 

It will only show results belong to that date. 

We can also use “=” in place of and it will still work the same. So the above 
query can be changed to 

date=201408* 

date=20140808 

As we discussed earlier we can also limit the search results to a particular time 
period. Let’s say we want to search something from a particular date to till date. In 
that case we can use 

date=>20140808 

It will provide results from 8th August 2014 to till date, but what if we want to 
limit both the start date and the end date. In that case also Yandex provide us a provi¬ 
sion of providing range. 

date=20140808..20140810 

Here we will get the results form date 8th August 2014 to 10th August 2014. 


domain: 

It can be used to specify the search results based of top level domains (TLDs). Mostly 
this type of the domain search was done to get results from country-specific domains. 
Let’s say we wanted to get the list of CERT-empanelled security service providing 
company names from different countries. In that case we can search for the country- 
specific domain extension let’s say we want to get these details for New Zealand then 
its TLD is nz. So we can craft a query like 

Example: “cert empanelled company” domain:nz 

lang: 

It can be used to search pages written in specific languages. 


Yandex supports some specific languages such as 
RU: Russian 
UK: Ukrainian 
BE: Belorussian 
EN: English 
FR: French 
DE: German 
KK: Kazakh 
TT: Tatar 
TR: Turkish 


Copyrighted material 



98 CHAPTER 5 Advanced Web Searching 


Though we can always use Google translator to translate the page from any 
languages to English or any other languages, it’s an added feature provided by 
Yandex to fulfill minimum requirements of the regions where Yandex is used 
popularly. 

So to search a page we need to provide the short form of the languages. 

Example: power searching lang:en 

It will search for the pages in English that contains power searching. 

cat: 

It is also something unique provided by Yandex. Cat stands for category. Yandex cat¬ 
egorizes different things based on region id or topic id. Using cat we can search for a 
result based on region or topic assigned in Yandex database. 

The details of Regional codes: http://search.yaca.yandex.ru/geo.c2n. 

The details of Topic codes: http://search.yaca.yandex.ru/cat.c2n. 

Though the pages contains data in Russian language, we can always use Google 
translate to serve this purpose. 

As we discussed in the beginning that Yandex is an underrated search engine 
some of its cool features are definitely going to put a mark on our life once we go 
through this chapter. One of such feature is its advanced search GUI. 

There are lazy people like me who want everything in GUI so that they just have 
to customize everything by providing limited details and selecting some checkbox or 
radio buttons. Yandex provides that in the below link 

http://www.yandex.com/search/advanced?&lr= 10558 

Here we have to just select what we want and most importantly it covers most 
of the operators we discussed above. So go to the page, select what you want, and 
search efficiently using GUI. 

Definitely after going through all these operators we can easily feel the impact 
of the advance search or we can also use the term power search for that. The 
advance search facilitates a user with faster, efficient, and reliable data in the result. 
It always reduces our manual efforts to get the desired data. And the content qual¬ 
ity is also better in advance search as we limit the search to what we are actually 
looking for. It can be either country-specific domain search, a particular file type, 
or content from a specific date. These things cannot be done easily with simple 
keyword search. 

We are in an age where information is everything. Then the reliability factor 
comes in to picture and if we want bulk of reliable information from the net in very 
less time span then we need to focus on the advance search. We can use any con¬ 
ventional search engine of our choice. Most of the search engines have quite similar 
operators to serve the purpose but there are some special features present; so look 
for those special features and use different search engines for different customized 
advance search. 


Copyrighted material 




Yandex 99 


So we learned about various search engines and their operators and how to utilize 
these operators to search better and get precise results. For some operators we say 
their individual operations and how they can help to narrow down the results and for 
some we saw how they can be used with other operators to generate a great query 
which directly gets us to what we want. Though there are some operators for differ¬ 
ent search engines which work more or less in the same fashion yet as the crawling 
and indexing techniques of different platforms are different, it is worthwhile to check 
which one of them provides better results depending upon our requirements. One 
thing that we need to keep in mind is that the search providers keep on deprecating 
the operators or features which are not used frequently enough and also some func¬ 
tionalities are not available in some regions. 

We saw how easily we can get the results that we actually want with the use 
of some small but effective techniques. The impact of these techniques is not just 
limited to finding out the links to websites, but if used creatively they can be imple¬ 
mented in various fields. Apart from finding the information on the web, which cer¬ 
tainly is useful for everyone, these techniques can be used to find out details which 
are profession specific. For example a marketing professional can scale the size of 
the website of competitor using the operator “site,” or a sales professional can find 
out emails for a company using the wildcard operator “*@randomcompany.com.” 
We also saw how search engine dorks are used by cyber security professionals to find 
out sensitive and compromising information just by using some simple keywords and 
operators. The takeaway here is not just to learn about the operators but also about 
how we can use them creatively in our profession. 

We have covered a lot about how to perform searching using different searching 
platforms in this and some previous chapters. Till now we have mainly focused on 
browser-based applications or we can say web applications. In the next chapter we 
will be moving on and learn about various tools which need to be installed as appli¬ 
cations and provide us various features for extracting data related to various fields, 
using various methods. 


Copyrighted material 


PA 100 


This page intentionally left blank 


Copyrighted material 



PA101 


CHAPTER 


OSINT Tools and 
Techniques 



INFORMATION IN THIS CHAPTER 

• OSINT Tools 

• Geolocation 

• Information Harvesting 

• Shodan 

• Search Diggity 

• Recon-ng 

• Yahoo Pipes 

• Maltego 


INTRODUCTION 

In the previous chapters we learned about the basics of the internet and effective ways 
to search it. We went to great depths of searching social media to unconventional 
search engines and further learned about effective techniques to use regular search 
engines. In this chapter we will move a step further and will discuss about some of 
the automated tools and web-based services which are used frequently to perform 
reconnaissance by professionals of various intelligence-related domains specially 
information security. We will start from the installation part to understanding their 
interface and will further learn about their functionality and usage. Some of these 
tools provide a rich graphic interface (GUI) and some of them are command line 
based (CLI), but don’t judge them by their interface but by their functionality and 
relevance in our field of work. 

Before moving any further we must install the dependencies for these tools so that 
we don’t have to face any issues during their installation and usage. The packages 
we need are 

• Java latest version 

• Python 2.7 

• Microsoft .NET Framework v4 

We simply need to download the relevant package depending upon our system 
configuration and we are good to go. 

Hacking Web Intelligence. httpV/dx.doi.org/10.1016/B978-0-12-801867-5.00006-9 101 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






102 CHAPTER 6 OSINT Tools and Techniques 


CREEPY 

Most of us are addicted to social networks, and image sharing is one of the most utilized 
features of these platforms. But sometimes when we share these pictures it’s not just the 
image that we are sharing but might also the exact location where that picture was taken. 

Creepy is a Python application which can extract out this information and display 
the geolocation on a map. Currently Creepy supports search for Twitter, Flickr, and 
Instagram. It extracts the geolocation based on EXIF information stored in images, 
geolocation information available through application programming interface (API), 
and some other techniques. 

It can be downloaded from http://ilcktrojohn.github.io/crccpy/. Wc simply need 
to select the version according to our platform and install it. The next phase after 
installation of Creepy is to configure the plugins that are available in it, for which 
we simply need to click on the Plug-in Configuration button present under the edit 
tab. Here we can select the plugins and using their individual configuration wizard 
configure them accordingly. Once the configuration is done we can check whether it 
is working properly or not using the Test Plugin Configuration button. 



FIGURE 6.1 

Configure Creepy. 

After the configuration phase is done, we can start a new project by clicking on 
the person icon on the top bar. Here we can name the project and search for people 
on different portals. From the search results we can select the person of interest and 
include him/her in the target list and finish the wizard. After this our project will be 
displayed under the project bar at the right-hand side. 


Copyrighted material 





















































PA 103 


Creepy 


S] crec.py Gcolocation OS1NT to< 

Creepy Edit View Fitters 

E ® -i, Jzt Ui 

fftrgtt Projects 


Piojecti 


f Locations 


<?> New Person Project 


^rir^ 


Search by username, mal, M name, id 


(2) 1 *] Flkkr Plugin 
l2j 8) Instagram Plugin 
(V) □ Twitter Plugin 


Search Results 


Plugin 

Picture 

Username 

Full Name 

Userid 

* 

Twitter Plugin 

K 

testuserNol 

testuser numbe... 

49628700 

□ 

Twitter Plugin 

a 

nakachiS6 

testuser 

177183359 


Twitter Plugin 

n 

jjUCC 

JJ TcstUscr 

348697535 


Twrtter Pluain 

h_ 

t«tuserlll2 

testnqpuser22_ 

158275283 

- 


|»dd To Targets] 


Selected Targets 


Plugin 

Picture 

Username 

Full Name 

Userid 







< Bad: | [ Next > | Cancel | 


FIGURE 6.2 

Search users. 

Now we simply need to select our project and click on the target icon or right 
click on the project and click Analyze Current Project. After this Creepy will start 
the analysis, which will take some time. Once the analysis is complete, Creepy will 
display the results on the map. 



FIGURE 6.3 

Creepy results. 

Now we can see the results in which the map is populated with the markers 
according the identified geolocation. Now Creepy further allows us to narrow down 
these results based on various filters. 


103 


Copyrighted material 











































































104 CHAPTER 6 OSINT Tools and Techniques 


Clicking on the calendar button allows us to filter the results based on a time 
period. We can also filter the results based upon area, which we can define in the form 
of radius in kilometers from a point of our choice. We can also see the results in the 
form of a heat map instead of the markers. The negative sign (-) present at the end 
can be used to remove all the filters imposed on the results. 



FIGURE 6.4 

Applying filter. 

The results that we get from Creepy can also be downloaded in the form of CSV 
file and also as KML, which can be used to display the markers in another map. 

Creepy can be used for the information-gathering phase during a pentest 
(penetration test) and also as a proof-of-concept tool to demonstrate to users what 
information they are revealing about themselves. 



FIGURE 6.5 

Download Creepy results. 


Copyrighted material 

















































































PA 105 


TheHarvester 


THEHARVESTER 

TheHarvester is an open source intelligence tool (OSINT) for obtaining e-mail 
addresses, employee name, open ports, subdomains, hosts banners, etc. from public 
sources such as search engines like Google, Bing and other sites such as Linkedln. It’s 
a simple Python tool which is easy to use and contains different information-gathering 
functions. Being a Python tool it’s quite understandable that to use this tool we must 
have Python installed in our system. This tool is created by Christian Martorella and 
one of the simple, popular, and widely used tools in terms of information gathering. 

TheHarvester can be found here: http://www.edge-security.com/theharvester.php 

Generally we need to input a domain name or company name to collect relevant 
information such as email addresses, subdomains, or the other details mentioned in 
the above paragraph. But we can use keywords also to collect related information. 

We can specify our search, such as from which particular public source we want to 
use for the information gathering. There are lots of public source that Harvester use for 
information gathering but before moving to that let’s understand how to use Harvester. 

EX: theharvester -d cxample.com -1 500 -b Google 

-d = Generally, domain name or company name 

-1 = Number of result limits to work with 

-b = Specifying the data source such as in the above command its Google, but 

apart from that we can use Linkedln and all (to use all the available public 

sources) as a source also to collect information. 

rootigjKau: - 


File Edit View Search Terminal Help 


ill:-# theharvester -d » ■ **.com -1 560 -b google 

*1111 A A 

: 

*1 1 ' \ / \ / / / / ' \ 

•\\// V | / \ ' | * 

* i i iii i i / / /in 

| \ V / A \ 11 / 1 

* \ _LI IJ\ _1 V /_/ \_,J 

_l w \_1 1 _A_\_LI * 

* TheHarvester Ver. 2.2a 

* 

* Coded by Christian Martorella 

* 

* Edge-Security Research 

* 

* cmartorella@edge-security.com 

* 

4:4::* it::*:**:**** l** ***************** 

[-] Searching in Google: 


Searching 0 results... 


Searching 100 results... 


Searching 200 results... 


Searching 300 results... 


Searching 400 results... 


Searching 500 results... 


[+] Emails found: 

mm Qumx 

st - -s ■= ■»“ .ne.com 


del" . a « - 


CO* 


au« -« n . « _■ 


tors iW B-rrli in - - n 


shf ine.com 


coi . " - 8 — om 



FIGURE 6.6 


105 


TheHarvester in action. 


Copyrighted material 












106 CHAPTER 6 OSINT Tools and Techniques 


Apart from the above mentioned one harvester also has other options to specify, 
such as: 

-s = to start with a particular result number (the default value is 0) 

-v = to get virtual hosts by verifying hostnames via DNS resolution 
-f=for saving the data, (formats available either html or xml) 

-n = to perform DNS resolve query for all the discovered ranges 

-c = to perform DNS brutcforce for all domain names 

-t = to perform a DNS TLD expansion discovery 

-e = to use a specific DNS server 

-l = To limit the number of result to work with 

-h = to use Shodan database to query discovered hosts. 


♦ [» file:///root/Desktop/aa.html 

Hi) Most Visited'' |jl Offensive Security \ Kali Linux \ Kali Docs H Ex P l °it-DB ^Aircrack-ng 

theHarvester results 

for ijppiii-iiiMe.com 

Dashboard: 


68 % 



E-mails names found: 

• stu* ■ i w fcne.com 

• dei* ■ ■ ■ ■ 

• coil. UlnJK ■ 

• aui _■ ■ - m ^ mm 

• ton- ■ ■ ■ ■ ■ 

• sh<ww .iRPUBWiyy jne.com 

FIGURE 6.7 

TheHarvester HTML results. 

The sources it uses are Google, Google profiles, Bing, pretty good privacy 
(PGP) servers, Linkedln, Jigsaw, Shodan, Yandex, name servers, people 123, and 


Copyrighted material 














PA 107 


Shodan 


Exalead. Google, Yandex, Bing, and Exalead are search engines that are used in 
backend as a source, while Shodan is also a search engine but not the conven¬ 
tional one and we already discussed a bit about it earlier and we will discuss in 
detail about the same in this chapter later. PGP servers are like key servers used 
for data security and those are also a good source to collect e-mail details. The 
people 123 is for searching for a particular person and Jigsaw is the cloud-based 
solution for lead generation and other sales stuffs. From different sources har¬ 
vester collects different information such as for e-mail harvesting it uses Google, 
Bing, PGP servers, and sometimes Exalead and run their specific queries in the 
background to get the desired result. Similarly for subdomains or host names it 
uses again Google, Bing, Yandex, Exalead, PGP servers, and Exalead. And finally 
for the list for employee names it uses Linkedln, Google profiles, people 123, and 
Jigsaw as a main source. 

This is how theHarvester harvests all the information and gives us the desired 
result as per our query. So craft your query wisely to harvest all the required 
information. 


SHODAN 

We have previously discussed about Shodan briefly in Chapter 4, but this unique 
search engine deserves much more than a paragraph to discuss its usage and impact. 
As discussed earlier Shodan is a computer search engine. The internet consists of 
various different types of devices connected online and available publicly. Most of 
these devices have a banner, which they send as a response to the application request 
send by a client. Many if not most of these banners contains information which 
can be called sensitive in nature, such as server version, device type, authentication 
mode, etc. Shodan allows us to search such devices over internet and also provides 
filters to narrow down the results. 

It is highly recommended to create an account to utilize this great 
tool, as it removes some of the restrictions imposed on the free usage. So 
after logging into the application we will simply go to the dashboard at 
http://www.shodanhq.com/home. Here we can see some the recent searches as 
well as popular searches made on this platform. This page also shows a quick ref¬ 
erence to the filters that we can use. Moving on let’s see more popular searches 
listed under the URL http://www.shodanhq.com/browse. Here we can see there 
are various different search queries which look quite interesting, such as web¬ 
cam, default password, SCADA, etc. Clicking on one of these directly takes us 
to the result page and lists details of machines on the internet with that specific 
keyword. The page http://www.shodanhq.com/help/filters shows the list of all 
the filters that we can use in Shodan to perform a more focused search, such as 
country, hostname, port, etc., including the usual filters and “I.” 


107 


Copyrighted material 





PA 108 


108 CHAPTER 6 OSINT Tools and Techniques 


f <?> SHOOAN Compulrr 


1 



FIGURE 6.8 

Shodan popular searches. 


FcaSHOOAN -ConymwSx x T M 
«- -r C D wvm.shodanhq.co 




= 


User Guide 
Fdter Reference 


Filters 


in additon to boolean operators, mere are special titers to narrow down re itocn resues. 


General 

Ml fillers tvave the formal Tiller value’ and a 


t be added angrwtrere in tie searcti query. Noace that there's no space before or alter 


» city 

Use the at)' Alter to find deuces totaled «tie given ot. its best combined mb the 'country tiler to nuke sure you pel the city in tie 
country you ward (eft/ names are notaftways urtquo) 


» country 

The 'country liter is used lo narrow results down by., country. Its useftil for when you ward to tod compuleis running n a speedte 
country, 

Examples: 

- _>.• seiw , ... .i i i .rj... i. ... • 

Ngta* servers tocsled in Germany noinn counlnr nr 


n goo 

The gotf frier allows rou to And devicri tbal are wdbin a certain radius ct Ihe gtan UMude and longitude. The Alter accepts enter 2 
or 3 arguments. The optonal nrd argument Is Tie radus in Mlometers wibln to search tor computers idstauit S) 


M^-Jie servers neai42.gW3.-74.l224 wikllllantt.42Jfigl 711221 
Deuces wttan a 50km radius cd San Diego (32 8.-117)r 080:32.8.117.50 


» hostname 

The hostvamo' Hler lets you search for hosts tiat contain fte value in tm hostname 
Examples: 


FIGURE 6.9 

Shodan filters. 


Let’s perform a simple search on Shodan for the keyword “webcam.” Shodan has 
simply found more than 15,000 results for this keyword; though we cannot view all the 
results under the free package, yet what we get is enough to understand its reach and 
availability of such devices on the internet. Some of these might be protected by some 
kind of authentication mechanism such as username and password, but some might be 
publicly accessible without any such mechanism. We can simply find out by opening 


Copyrighted material 













































PA 109 


Shodan 


their listed IP address in our browsers (Warning: It might be illegal to do so depending 
upon the laws of the country, etc.). We can further narrow down these results to a 
country by using the “country” filter. So our new query is “webcams country:us” which 
gives us a list of webcams in the United States of America. 



Services 

HTTP 

HTTPAMeinate 



3S.72.104.72 

cuiic 


mmoc* at 02221 c 
■ SWmVHli 


401 UlMItorUcd 

one n .t»c»tioc«ti 2A2IC ctelnr-W (Oliver. Active maCW 
C»1|V1-I»0»! 1 «*t/Mr>l 


Top Countries 

UMOd Staton 
Geimany 

Korea. Republic at 



2.208 

2.D0S 

1,382 


— VIDEO WEB SERVER 

Tu'k T.WI.*., 

Add an 19-01291-4 

B l.wi 


IVtliKHtiV 


Miwt.e too ore 




AdCteot 10 0* 304-4 




Cc.i.it-Tjp .1 mt/hmli cMrtctnitr -0 
CcniMi-Lcnttht 2lM 


octet nwi, is 0.0 sort »tsr>oo «nt 


FIGURE 6.10 

Shodan results for query “webcam” 


To get a list of machines with file transfer protocol (FTP) service, residing in India, 
we can use the query “port:21 country:in” We can also perform search for specific IP 
address or range of it using the filter “net.” Shodan is providing a great deal of relevant 
information and its application is only limited by the creativity of its users. 


f eg, shodan Computer 

- C D www.shodanhqxom/search ?q «port%3A2l *country%3Ain 


Shodan Explorts Scanhuto Maps Blog Mem berth c 






Settingsi Logout 2— r3 tf) 


+ Add to Directory ill export Data 


Top Citio* 

MtwDMM 


122,179,140,225 


115.118xlfi5.iaa 

TATA Ccmnuilco*o.c 

C4MCV1 19 Ot 3014 


116 112.1M 1»M 


i,<q ,u,k«i 0 72117 MM « 


Rt sots 1 -10 atiCM 1087545for port2l eogntiy:* 


421 Lee in UowreM. 
421 Let if, leoccrKt. 


220 bliod 2.2 .1 IIS-110. US-101 reccV. 

SW Let It, Uocerttc. 





421 Letin i.Meretl. 


220 FllcTillc Server umlcn 0.1.22 beta urlticr, by Tin Kocce ITl4i.Ketc40vM.de> Olccce vtctc Itt'ecs/ceureetcroe. 



“VT1 yiT rra r»T re>: see pts re.* 


FIGURE 6.11 

Shodan results for query "port:21 country:in." 


109 


Copyrighted material 






























110 CHAPTER 6 OSINT Tools and Techniques 


Apart from this Shodan also offers an API to integrate its data into our own 
application. There are also some other services provided by it at a price and are 
worth a try for anyone working in the information security domain. Recently there 
has been a lot of development in Shodan and its associated services which makes 
this product a must try for information security enthusiasts. 


SEARCH DIGGITY 

In the last chapter we learned a lot about using advanced search features of various 
search engines and also briefly discussed about the term “Google Hacking.” To per¬ 
form such functions we need to have the list of operations that we can use and will 
have to type each query to see if anything is vulnerable, but what if there was a tool 
which has a database of such queries and we can simply run it. Here enters the Search 
Diggity. Search Diggity is tool by Bishop Fox which has a huge set of options and a 
large database of queries for various search engines which allow us to gather com¬ 
promising information related to our target. It can be downloaded from http://ww- 
w.bishopfox.com/resources/tools/google-hacking-diggity/attack-tools/. The basic 
requirement for its installation is Microsoft .NET framework v4 

Once we have downloaded and installed the application, the things we need are 
the search ids and API keys. These search ids/API keys arc required so that we can 
perform more number of searcher without too many restrictions. We can find how 
to get and use these keys in the contents section under the Help tab and also from a 
some simple Google searches. Once all the keys (Google, Bing, Shodan, etc.) are at 
their place we can move forward with the usage of the tool. 


V Search Diggity | «=> II 13 II S3 I 

File Options Help 

CodeSearch Bing j| LinkFromDomain || DIP [ Flash | Malware PortScan NoUnMyBackyard J BingMalware Shodan , 



Google Status: Ready Download Progress: Idle Open Folder 


FIGURE 6.12 

Search Diggity interface. 


Copyrighted material 





















































PA111 


Search Diggity 111 


There are many tabs in the tool such as Google, Bing, DLP, Flash, Shodan etc. 
Each of these tabs provides specialized functions to perform targeted search to 
identify information which can be critical from an information security point of 
view. 

To use the tool we simply need to select one of the tabs at the top and further 
select the type of queries that we want to use. We can also specify the domain that 
we want to target and simply perform the scan. Depending upon what is available 
online the tool will provide us the results for various different queries related to the 
query type we have selected. It is highly recommended to select only the query types 
that we are really interested into, as it will help us to narrow down the total number 
of queries. The queries present are categorized properly to identify and make choice 
accordingly. 

Let’s use the queries to identify SharePoint Administrative pages. For this 
we simply need to select the Google tab and from the left-hand menu, check the 
Administrative checkbox under SharePoint Diggity, and run the scan. 



FIGURE 6.13 

Search Diggity scan—Google tab. 

To make this scan more targeted we can specify a list of targets under the option 
Sites/Domains/IP Ranges. As soon as we start the scan we can see the results coming 
up with various information like category, page title, URL, etc. Similarly we can also 
use the Bing scan which has its own set of search queries. 


Copyrighted material 











































PA113 


Recon-ng 


T r Search Diggity cd || B ||£3^ 

File Options Help 



Shodan Status: Scanning... 


FIGURE 6.16 

Search Diggity—Shodan scan. 


Recon-ng 

There are many tools for reconnaissance but a special mention should be given to 
Recon-ng. This is an open source tool written in Python majorly by Tim Tomes 
(@Lanmaster53). There are many other researchers, coders, and developers who have 
contributed to this project. This project is one of its kind in terms of complete OSINT 
framework. The authors might have different opinion on my previous statement but 
still this framework helps all the OSINT enthusiast to perform various stages of 
reconnaissance in automated way. 

It mainly focus on web-based open-source reconnaissance and provides its users 
with unique independent modules, elaborated and much required command based 
help, database interaction and command completion facility to perform reconnais¬ 
sance deeply and fast paced. Apart from that it’s made in a fashion that if a newbie 
into the field of security wants to contribute to it, he/she can easily do it with a little 
Python knowledge. It’s just possible only because of well-structured modules, fully 
fledged documentation, and the uses of only-native Python functions that a new user 
or contributor will not face problem to download and install third party modules of 
Python for a specific task. 


The tool can be downloaded from: https://bitbucket.org/LaNMaSteR53/recon-ng 
The user guide: https://bitbucket.Org/LaNMaSteR53/recon-ng/w iki/Usage_Guide 
The development guide: https://bitbucket.org/!,aNMaSteR53/recon-ng/wiki/Development Guide 


113 


Copyrighted material 


























































114 CHAPTER 6 OSINT Tools and Techniques 


Apart from the perspective of developer or contributor the author also focused on 
the ease of use for the users. The framework looks quite same as Metasploit which is a 
quite popular tool for exploitation in information security community. If you arc from 
information security community or you have prior experience of using Metasploit, it’s 
quite the same to use Recon-ng. 

Recon-ng is quite easy to install. And to run the same we just need Python 
2.7.x installed in our system. Just call recon-ng.py file from a terminal and you 
will get a fancy banner of the toll with credits and all along with that a recon-ng 
prompt. 

To check all the available commands we can use command help. It will show all 
the available commands 

> help 


add 

Adds records to the database 

back 

Exits the current context 

del 

Deletes records from the database 

exit 

Exits the framework 

help 

Displays this menu 

keys 

Manages framework API keys 

load 

Loads specified module 

pdb 

Starts a Python Debugger session 

query 

Queries the database 

record 

Records commands to a resource file 

reload 

Reloads all modules 

resource 

Executes commands from a resource file 

search 

Searches available modules 

set 

Sets module options 

shell 

Executes shell commands 

show 

Shows various framework items 

spool 

Spools output to a file 

unset 

Unsets module options 

use 

Loads specified module 

workspaces 

Manages workspaces 


Here in this framework some fine features are provided such as workspaces. It 
consists of different settings, database, etc., and a self-independent place for a single 
project. 

To know more about workspaces, we can use the command 
> help workspaces 


Copyrighted material 


Recon-ng 115 


This command is used to manage workspaces such as providing user freedom to 
list down, add, select, delete workspaces. If a user does not set a workspace externally, 
then he/shc will be under default workspace. If we want to check in which workspace 
we are exactly then the command is 

> show workspaces 

+ - + 

I Workspaces I 

+.+ 

I default I 
+.+ 

And we will get something similar to this showing that we are under default 
workspace. 

Let’s say we want to change the workspace to something that we want, let’s say 
osint, then the command would be 

> workspaces add osint 

The prompt itself shows the workspace so the default prompt we will get in fresh 
installation is 

[recon-ng] [default] > 

After the above command the prompt will change in to 
[recon-ng] [osint] > 

Now it’s time to explore the commands and its capabilities. If you are using this 
tool for the first time the most needed command after “help” is “show.” 

[recon-ng] [osint] > show 

Using this command we can see available details of banner, companies, contacts, 
credentials, dashboard, domains, hosts, keys, leaks, locations, modules, netblocks, 
options, ports, pushpins, schema, vulnerabilities, and workspaces details but here we 
want to explore the modules section to see what all are possibilities available. 


Basically recon-ng consists of five different sections of modules. 

1. Discovery 

2. Exploitation 

3. Import 

4. Recon 

5. Reporting 


Copyrighted material 






116 CHAPTER 6 OSINT Tools and Techniques 


Applications Places 




- VMwnre Ptaycr £ile ” Xiitual Machine » Help ” 


File Edit View Search Terminal Help 


(recorvng v4.1.7, Jim Tomes {(JL*JMaSteR53)] 


(57] Recon modules 
15] Reporting modules 
(2] Exploitation modules 
(2] Discovery modules 
(1] Import modules 

(recon-ng)(default) » show modules 

Discovery 


discovery/info_disclosure/cache_sr*oop 
discovery/info_disclosure/interesting_files 


Exploitation 


expl oi tat ion/inj ec t ion/commar»d_in j ec t o r 
exploitation/in]ec tion/xpath_brute r 

Import 

import/csv_file 

Recon 


mm aocacoKi 


recon/companies-contacts/facebook 
recon/companies-contacts/]lgsaw 
recon/companies-contacts/]igsaw/point_usage 
recon/companies-contacts/]igsaw/purchase_contact 
recon/companies-contacts/jigsaw/search_contacts 
recon/companies-contacts/linkedin_auth" 
recon/companies-contacts/linkedin“c rawl 
recon/contacts-contacts/nangle 
recon/contacts-contacts/namechk 
recon/contacts-contacts/rapportive 

□ root@)kali: - 


FIGURE 6.17 

Recon-ng modules. 


And by using following commands we will able to see more details as off the 
available options about these five sections 

Irecon-ngJ [osint] > show modules 

such as under-discovery interesting files, under-exploitation command injection, 
under-import CSV files, under-recon company contacts, credentials, host details, 
location information and many more and last hut not the least under-reporting CSV, 
HTML, XML, etc. 

Now we can use these modules based on our requirements. To use any of these 
modules, first, we need to load that module using following command but before 
that we must know that this framework has a unique capability to load a module 
by auto completing it or if more modules are available with a single keyword 
then giving all the module list. Let’s say we want to check the pwnedlist and we 
are so lazy to type the absolute command. Nothing to worry just do as shown 
below 

[recon-ng] [osint] > load pwnedlist 

Now recon-ng will check whether this string is associated with a single mod¬ 
ule or multiple modules. If it is associated with a single module then it will 
load that or else it will give the user all the available modules that contain this 
keyword. 


Copyrighted material 













118 CHAPTER 6 OSINT Tools and Techniques 


Applications Places 




& Kafi - VMwsirc Ptnycr Fife ~ yirtiul Machine • fcje*p ▼ 


Filt Edit View Search Terminal Help 


(recon-ng)[osint](pwnedlist) > show info 
Name: PwnedList Validator 

Path: modules/r«con/contacts-creds/pwnedlist.py 
Author: Tim Tomes («LaNNoSteR53) 

[Description: 

Leverages PMiedList.com to determine if email addresses are associated with leaked credentials. Adds 
compromised email addresses to the 'credentials' table. 

ptions: 

Name Current Value Req Description 

SOURCE google6gmail.com yes source of input (see 'show info' for details) 

ISource Options: 

default SELECT DISTINCT email FROM contacts WERE email IS NOT NULL ORDER BY email 

<string> string representing a single input 

<path> path to a file containing a list of inputs 

query <sql> database query returning one column of inputs 


(recon-ng)[osint)[pwnedlist] 
(recon-ngjjosintj(pwnedlistj 
(recon-ngj[osintj[pwnedlistj 
(recon-ngj[osintj[pwnedlistj 
(recon-ngj[osintj[pwnedlistj 
(recon-ngj[osintj(pwnedlistj 
(recon-ngj[osintj(pwnedlistj 
(recon-ngj[osintj[pwnedlistj 
(recon-ng][osint][pwnedlist] 
(recon-ngj[osintj[pwnedlistj 
[recon-ngj[osintj[pwnedlistj 
[recon-ngj[osintj[pwnedlistj 
(recon-ngj(osintj(pwnedlistj 
(recon-ngj(osintj[pwnedlistj 
(recon-ngj[osintj[pwnedlistj 
(recon-ngj[osintj[pwnedlistj 

□ root<g)kali: - 


mm omes 


FIGURE 6.19 

Recon-ng module detailed information. 


This command provides detailed information, name of the module, path, author 
name, and description in details. 

As we can see from the above figure that we need to add a SOURCE as an input 
to run this module, and what kind of input needed is also mentioned in the bottom 
part of the same figure. We can craft a command such as 

[rccon-ng] [osint] [pwnedlist] > set SOURCE google@gmail.com 

This command is taken as we provided a proper and valued input. Now the com¬ 
mand to run this module to check whether the above e-mail id is pwned somewhere 
or not is 

[recon-ng] [osint] [pwnedlist] > run 


File Edit View Search Terminal Help 


[ recon-ng] [osint] [pwnedlist] > set SOURCE google@gmail.com 
SOURCE => google@gmail.com 
[rec on -ng][osint][pwnedlist] > run 

[*] google@gmail.com => Pwned! Seen at least 27 times, as recent as 2014-08-28. 


SUMMARY 


[*3 1 total (0 new) items found. 


FIGURE 6.20 

Recon-ng results. 


Copyrighted material 
















Recon-ng 119 


Voila! The above e-mail id has been pwned somewhere. If we want to use some 
other modules we can simply use the “load” command along with the module name 
to load and use the same. 

This is how we can easily use this recon-ng. The commands and approaches will 
remain quite same. First look for the modules. Choose the required module, load it, 
check for its options, provide values to the required fields, and then run. If required 
repeat the same process to extend the reconnaissance. 

Now let’s discuss some of the scenarios and the modules that can be handy for 
the same. 

CASE 1 

If we are into sales and desperately wanted to collect database to gather pro¬ 
spective clients then there are certain modules available here that will be pretty 
helpful. If we want to gather these information from social networking sites, 
Linkedln is the only place where we can get exact names and other details as 
compared to other sites which generally consists of fancy aliases. And if we are 
in to core sales then we might have heard of portals like Sales Force or Jigsaw, 
where we can get certain details either by free or by paying reasonable amount 
of money. And mostly nowadays in IT sector sales teams focus less on cold call¬ 
ing and more on spreading details on e-mail. So getting valid e-mails from a 
target organization is always like half work done for sales team. So here we will 
discuss the sources available to get these information and its associated modules 
in recon-ng. 

Available modules: 

recon/companies-contacts/facebook 

rccon/companies-contacts/jigsaw 

recon/companies-contacts/linkedin_auth 

These are some of the modules but not all that can be helpful to gather informa¬ 
tion such as name, position, address, etc. 

But e-mail addresses are the key to contact. So let’s look into some options to 
collect e-mail addresses. We can collect some e-mail id details from Whois database. 
Search engines also sometimes play a vital role in collecting e-mail address, using 
PGP servers. 

Available modules: 

recon/domains-contacts/pgp_search 

recon/domains-contacts/whois_pocs 

CASE 2 

Physical tracking. The use of smart phones intentionally or unintentionally allowed 
users to add their geolocation with data that they upload to different public sites 


Copyrighted material 


120 CHAPTER 6 OSINT Tools and Techniques 


such as YouTube, Picasa, etc. In that case we can collect information by the help of 
geotagged media. This can be used for behavioral analysis, understanding a person’s 
likes and dislikes, etc. 

Available modules: 

recon/locations-pushpins/flickr 

rccon/locations-pushpins/picasa 

recon/locations-pushpins/shodan 

recon/locations-pushpins/twitter 

recon/locations-pushpins/youtube 


CASE 3 

If some organization or person wants to check whether he/she or any of a company’s 
e-mail id has been hacked, then there are certain modules that can be helpful. Similar 
to what we already discussed above, i.e., pwnedlist, there are other modules that can 
give similar results: 

recon/contacts-creds/pwnedlist 

recon/contacts-creds/haveibeenpwned 

recon/contacts-creds/should_change_password 


CASE 4 

For penetration testers it is also like hidden treasure because they can perform pen¬ 
etration testing without sending a single packet from their environment. The first 
approach to do any penetration testing is information gathering. Let’s say we want 
to perform a web application penetration testing then the first thing we want to enu¬ 
merate is what technology or server the site is running on. So that we can manually 
search later for the publicly available exploits to exploit the same. In this case recon- 
ng has a module to find the technology details for us. 

Available module: 

recon/domains-contacts/builtwith 

Now after getting these details, generally, we look into the vulnerabilities avail¬ 
able in the net associated with that technology. But we can also look at the vulner¬ 
abilities associated with that domain. And it is possible by the use of punkspider 
module. Punkspider uses a web scanner to scan the entire web and collect detailed 
vulnerabilities and store it in its database, which can be used to directly search for the 
available exposed vulnerabilities in a site. 

Available modules: 

recon/domains-vulnerabilities/punkspider 

recon/domains-vulnerabilities/xsscd 


Copyrighted material 


PA121 


Yahoo Pipes 


Applications Places £ Q 


•Cli - VMwarc Ptayor File ■» Jfiitual Machine ” b*)p ■» 

root©kali: ~ 



Pile Edit View Search Terminal Help 


rocon-rvg 

JJJ JJJJ _/_/_/ JJJ J J J J JJJ 

III I / II f J Jill 

7 /_/ /_// / 7 ./ j j j jjj j j j j j jjj 

r ii j j J J 11 J ii J / 

7 7 /// "/// “///" 7 “7 7 “7 "///" 


IJI _ _l_ LI -I I _ I _ I_I _o_ _ <__o_|_ 

IJIUUV I IIILV J J II <J I mu I ID I I _>(/_<_! JI I I v 

/ 

Consulting | Research | Oevelopeient | Training 
http://www.blackhiUsinfosec .co« 


[r«con-ng v4.1.7. Tin To«*s (gLaNMaSt*R53)] 

[57] Recon modules 
[5] Reporting modules 
[2] Exploitation modules 
[2] Discovery modules 
[1] Import modules 

[recon-ng](default) > load punkspider 
[recon-ngj[default][punkspider] > set SOURCE 
SOURCE -> .com 

[recon-ng](default][punkspider] > run 


•] Host: www.i*r«..com 

Attack: http://www.-ia, i . .. • -j-, Lraps/m.html?rt"nc6item«350701514735Shash»item51a77e6fef%2FSpt»Vintage_Unisex_T_ShirtSiAMD+l%3OI 

*] Parameter: pt 

*] Published: Thu Oct 31 03:27:26 GHT 2013 

•] Category: BSQLI _ 

ED root(5)kali; - 13! 


mm aoraQDm 


FIGURE 6.21 

Recon-ng PunkSpider in progress. 


Now for a network penetration-testing perspective port scanning is also an 
important thing. And this framework has modules to perform port scanning. 

Available module: 

rccon/netblocks-ports/census_2012 

Apart from these there are direct exploitation modules available such as 

exploitation/injection/command_injector 

exploitation/injection/xpath_bruter 

There are different modules for different functions; one major function among 
them is the credential harvest. Still researchers are contributing in this project and 
authors expanding the features. The ease of use and structural modules makes this 
framework one of the popular tools for OSINT. 


YAHOO PIPES 

Yahoo Pipes is a unique piece of application from Yahoo which provides user the 
freedom to select different information sources and enable some customized rules 
according to own requirements to get filtered output. The best thing about the tool is 
the cool GUI where a normal internet user can also create his/her own pipes to get 
desired filtered information from different sources. 


121 


Copyrighted material 















122 CHAPTER 6 OSINT Tools and Techniques 


As we all arc OSINT enthusiasts, the only thing that matters to us is valid required 
information. There are information available in different parts of the web. And there 
are different sources to get information regularly. The problem is that how to dif¬ 
ferentiate the information we want from the numerous information provided by a 
particular source. If we need to filter the required information manually from a set of 
information then it requires a lot of manual effort. So to ease the process this applica¬ 
tion will help us a lot. 

Requirements: 

• A web browser 

• Internet connectivity 

• Yahoo Id 

As it’s a web application we can access it from anywhere, and the lesser 
dependency make it more usable along with its user friendly GUI. We can access 
the application from below mentioned URL. 

https://pipes.yahoo.com/ 

Visit this URL, login with your yahoo id and we are all set to use this application. 
Another major plus point of this application is its well-formed documentation. Apart 
from that we can find links to different tutorials (text as well as video) in the application 
site itself describing how to start and other advance stuffs. Along with that for reference 
purpose there are also links to popular pipes available. Let’s create our own pipe. 

To create an own pipe we need to click on Create pipe button on the application. 
It will redirect to http://pipcs.yahoo.com/pipes/pipc.cdit 

In the right top corner we can find tabs like new, save, and properties. By default 
there is no necessity to do anything with these tabs. As we are about to start creating 
a new pipe, the things to be noted are that in the left side of the application we will 
find different tabs and subtabs such as sources, user inputs, operators, URL, etc. 

These are the tabs from where we can drag the modules to design the pipe. 
Basically a pipe starts with a source or multiple sources. Then we need to create 
some filters as per our requirements using operators, date, location, etc., and then 
finally need to add an output to get the desired filtered information. 

So to start with lets drag a source from the source sub tab, there arc different 
options available such as Fetch CSV, Fetch data, Fetch feed, etc. Let’s fetch from 
feeds as it’s a very good source of information. Drag the Fetch Feed sub tab to the 
center of the application. When we drag anything to the center it will generate an 
output box for us, where it will ask us to add the feed URL. Add any feed URL in my 
case I am using http://feeads.bbci.co.uk/news/rss.xml7editionsint. 

For the demo purpose I’ll show only single source example but we can also add 
multiple sources for one pipe. Now it’s very important to create a proper filter, which 
will give us the proper output. Now drag filter sub tab from Operators tab. By default 
we will see “block,” “all,” “contains” keywords there and some blank spaces to fill. 
Change that to “Permit,” keep the “all” as it is and add item description in first blank 
space following with “contains” following with US. So our filter will provide us data 
which contains only keyword “US” in its item description. 


Copyrighted material 




PA 123 


Yahoo Pipes 


4- 4 O D pipes.yahoo.com/pipes/pipeeditMd = lf7a72419df5df600accdd8828005da4 




[.pipes 

Layout b. 


Test pipe (US) 
andM CoitapMM 


Pipe Saved 7unPipe... 




; Feed Aulo O 
[FeKftCSV 

[Item Rulliter o' 

[XPath Fetch Rage 


;yol 


•R.i 


O hdplteoda bbo co uWiewwrss 




F«ch Feed 

[Find First Site Feed o' 

► User input'. 

► Operators 

► Ufl 


Perm* » items nut match all * or ine fononreing 
O Rules 


© item cescrcdon ►! Certain* 


► Raeeie ’to elter military strategy’ 

F OS ’targets al-Shobab’s leader’ 

F Terror suspectj held in Saudi Xrehla 
F Dm la ’ threatens’ W Africa harvests 
F Ober banned across German/ by court 
F Dethrooed beauty ctueen seels apology 
F Ruetlen sex geckos die in orbit 
F survivor singer Jamison dies aged (3 
•17 wore... 


Debugger Pipe Output (25 items) 


FIGURE 6.22 

Creating a Yahoo Pipe. 


Now connect all the pipe points from sources box (Fetch Feed) to Filters box and 
from Filter box to Pipe Output box. First save the pipes and then run the pipes to get 
the output in a new tab. 


C 2) pipes.yahoo.com/pipes/pipeinfo?_id= lf7a72419df5df600accdd8828005da4 

□ imp □ bnpcrttd 


Spipes 


Home My Pipes Browse Discuss Documentation 


You’re legged « e» ludhemhueMuhonKi? (logout) 

f S— reft fo rflib—... TPl 


□ Sudhanshu 

fcroMe) 

FdtProMa 

Properties 
Not published 
0 clones 

Bookmark / Share 

■■ w n a dx 

Tags id 

america O 

add new tag □ 

Sources <u 

bbei co uk 
feeds bbei.co.uk 


Modules (2) 

filter 

fetch 



Test pipe (US) 

Click to add description 

Pipe Web Address htip:<wpes.yahoo.comwpesjpipe.inro?jd=if7a724l9dt5d«00accdde823005da4 (edit) 
☆ fcdt Source Delete Publish Clone 



FIGURE 6.23 

Yahoo Pipe result. 


123 


Copyrighted material 

























































124 CHAPTER 6 OSINT Tools and Techniques 


Wc can use it in many other scenarios like collecting images of a specific person 
from flicker, filtering information by URL, date- and location-based and many 
others. Explore this to create as customized pipes as possible. This tool provides 
freedom to create pipes way beyond our imagination. 


MALTEGO 

There are many OSINT tools available in the market, but one tool stands out because 
of its unique capabilities, Maltcgo. 

Maltego is an OSINT application which provides a platform to not only extract data 
but also represent that data in a format which is easy to understand as well as analyze. 
It’s a one stop shop for most of the recon requirements during a pentest, what adds to its 
already great functionalities is the feature which allows users to create custom add-ons 
for the platform (which we will discuss later) depending upon the requirement. 

Currently Maltego is available in two versions: commercial and community. 
Commercial version is paid and we need a license key for it. The community version 
however is free and we only need to register at the site of Pateva (creator of Maltego) 
at this page: https://www.paterva.com/web6/community/maltego/index.php. Though 
community version has some limitations in comparison to commercial version, like 
limited amount of data extraction, no user support, etc., still it is good enough to feel 
the power of this great tool. During this chapter we will be using the community 
version for the demo purpose. 

Let’s see how this tool works and what we can utilize it for. 

First of all unlike most of the application software used for recon, Maltego pro¬ 
vides a GUI, which not only makes it easier to use but is a feature in itself, as the 
data representation is what makes it stand out of the crowd. It basically works on 
client-server architecture, which means that what we as a user get is a Maltego client 
which interacts with a server to perform its operations. 

Before going any further let’s understand the building blocks of Maltego as listed 
below. 


ENTITY 

An entity is a piece of data which is taken as an input to extract further information. 
Maltego is capable of taking a single entity or a group of entities as an input to extract 
information. They are represented by icons over entity names. E.g. domain name 
xyz.com represented by a globe-like icon 

TRANSFORM 

A transform is a piece of code which takes an entity (or a group of entities) as an 
input and extracts data in the form of entity (or entities) based upon the relationship. 
E.g. DomainToDNSNameSchema: this transform will try to test various name sche¬ 
mas against a domain (entity). 


Copyrighted material 





PA 125 


Maltego 


MACHINE 

A machine is basically a set of transforms linked programmatically. A machine 
is very useful in cases where the starting data (in form of an entity) and the 
desired output data are not directly linked through a single transform but can be 
reached through a series of transforms in a custom fashion. E.g. Footprint LI: 
a transform which takes a domain as an input and generates various types of 
information related to the organization such as e-mails. Autonomous System AS 
number, etc. 

First of all as mentioned above, we need to create an account for the community 
version. Once we have an account we need to download it from https://www.patcr 
va.com/web6/products/download3.php. The installation of the application is pretty 
straightforward and the only requirement is Java. Once the installation is complete 
we simply need to open the application and login using the credentials created during 
the registration process. 

Now as the installation and login processes are complete, let’s move on to the 
interface of Maltego and understand how it works. Once we are logged into the 
application it will provide us with some options to start with; we will be starting with 
a blank graph so that we can understand the application from scratch. Now Maltego 
will present a blank page with different options on top bar and a palette bar on the 
left. This is the final interface we will be working on. 



FIGURE 6.24 

Maltego interface. 


On the top left corner of the interface is the Maltego logo, clicking on which will 
list down the options to create a new graph, save the graph, import/export configura¬ 
tions/entities, etc. The top bar in the interface presents five options, let’s discuss them 
in detail: 


125 


Copyrighted material 


















126 CHAPTER 6 OSINT Tools and Techniques 


INVESTIGATE 

This is the first option in the top bar which provides basic functions such as cut, 
copy, paste, search, link/entity selection, as well as addition. One important option 
provided is Select by Type, this options comes in handy when there is a huge amount 
of data present in the graph after running a different set of transforms or machines 
and we are seeking a specific data type. 

MANAGE 

The Manage option basically deals with entity and transform management with some 
other minor functions such as notes and different panel arrangements. Under the 
Entities tab we get the options to create new entities, manage existing ones, and 
their import/export; similarly the Transforms tab presents the options to discover new 
transforms, manage existing ones, and create new local transforms (we will discuss 
creating local transforms in later chapter. 



FIGURE 6.25 

Maltego Manage tab. 

ORGANIZE 


Once we are done with extracting the data, we need to set the arrangement of the 
graph to make a better understanding of it, this is where the Organize option comes 
in. Using the underlying options we can set the layout of the complete graph or 
selected entities into different forms, such as Hierarchical, Circular, Block, etc. We 
can also set the alignment of entities using the functions under “Align Selection” tab. 



FIGURE 6.26 

Maltego Organize tab. 


Copyrighted material 
























PA 127 


Maltego 


MACHINES 

As described before machines are an integral part of the application. Machines tab 
provides the options to run a machine, stop all machines at once, create new machines 
(which we will discuss in later chapter) and to manage existing ones. 


COLLABORATION 

This tab is used to utilize the feature introduced in late version of Maltego which 
allows different users to work as a team. Using the underlying options users can share 
their graphs with other users in real time as well as communicate through the chat 
feature. This feature can be very helpful in Red Team environments. 

The palette bar on the left is used to list all the different types of entities present 
in Maltego. The listed entities are categorized according to their domain. Currently 
Maltego provides 20+ entities by default. 

Now as we arc familiar with the interface we can move on to the working of 
Maltego. 

First of all to start with Maltego we need a base entity. To bring an entity into the 
graph we simply need to drag and drop the entity type we need to start with, from 
the palette bar on the left. Once we have the entity in the graph, we can either double 
click on the name of the entity to change its value to the value of our desire or double 
click on the entity icon which pops up the details window where we can change data, 
create note about that entity, attach an image, etc. One thing that we need to keep in 
mind before going any further is to provide the entity value correctly depending upon 
the entity type e.g. don’t provide a URL for an entity type “domain.” 

Once we have set the value of an entity we need to right click on that entity and check 
the transforms listed for that specific entity type. Under the “Run Transform” tab we 
can see the “All Transforms” tab at the top, which will list all the transforms available 
for the specific entity type; below that tab we can see different tabs which contains the 
same transforms classified under different categories. The last tab is again “All Trans¬ 
forms,” but use this one carefully as it will execute all the listed transforms at once. This 
will take up a lot of time and resources and might result into a huge amount of data that 
we don’t desire. 

Now let’s take up the example of a domain and run some transforms. To do 
this simply drag and drop the domain entity under infrastructure from the palette 
bar to the graph screen. Now double click on the label of the entity and change 
it to let’s say google.com. Now right click on it and go to “All Transforms” and 
select the “To DNS Name - NS (name server).” This transforms will find the 
name server records of a domain. Once we select the transform we can see that 
results start to populate on the graph screen. The progress bar at the bottom of 
the interface shows if the transform is complete or is still running. Now we can 
see that Maltego has found some name server (NS) records for the domain. We 
can further select all the listed NS records and run a single transform on them. 
To do this simply, select the region containing all the records and right click to 
select a transform. Let’s run the transform “To Netblock [Blocks delegated to 


127 


Copyrighted material 




128 CHAPTER 6 OSINT Tools and Techniques 


this NS],” this transform will check if the NS record have any (reverse) DNS 
netblocks delegated to them. In the graph window itself we can see at the top 
that there are some options to try like Bubble View, which shows the graph as 
a social network diagram with the entity size depending upon the number of 
inbound and outbound edges; the Entity List as the name suggests lists down 
all the entities in the graph and some others like freeze view, change layout to 
Block, Hierarchical, Circular, etc. 



FIGURE 6.27 

Maltego Transform result (Domain to DNS Name - NS (name server)). 

Similar to running a transform on an entity we can also run a machine. Let’s stick 
to our example and take a domain entity with value google.com. Now we simply 
need to right click on the entity, go to “Run Machines” tab and select a machine. 
For this example let’s simply run the machine “Footprint LI.” This machine will 
perform a basic footprint of the domain provided. Once this machine is executed 
completely we can see that it displays a graph with different entities such as name 
servers, IP addresses, websites, AS number, etc. Let’s move forward and see some 
specific scenarios for data extraction. 

DOMAIN TO WEBSITE IP ADDRESSES 

Simply take a domain entity. Run the transform “To Website DNS [using Search 
Engine].” It queries a search engine for websites and returns the response as 
website entities. Now select all the website entities we got after running the 
transform and run the transform “To IP Address [DNS].” This will simply run 
a DNS query and get us the IP addresses for the websites. This sequence of 


Copyrighted material 


















PA 129 


Maltego 


transforms can help us to get a fair understanding of the IP range owned by the 
organization (owning the domain). We can also see which websites have multiple 
IP addresses allocated to them. Simply changing the layout of the graph, to 
say circular, can be helpful in getting a better understanding of this particular 
infrastructure. Information like this is crucial for an in-depth pentest and can play 
a game changing role. 

E.g.: Domain = google.com 


I I &*yu« • ? | iLfifil” 

Q 

ooo 

o 

o 







FIGURE 6.28 

Maltego Transform result (Domain to Website IP). 

DOMAIN TO E-MAIL ADDRESS 

There is a set of transforms for extracting e-mail address directly from a domain, 
but for this example we will be following a different approach using metadata. Let’s 
again take a domain entity and run all the transforms in the set “Files and Documents 
from Domain.” As the name itself says, it will look for files listed in search engine for 
the domain. Once we get a bunch of files, we can select them and run the transform 
“Parse meta information.” It will extract the metadata from the listed files. Now let’s 
run all the transforms in the set “Email addresses from person” on the entities of type 
entity and provide the appropriate domain (domain we arc looking for in the e-mail 
address) and a blank for additional terms. We can see the result from this final trans¬ 
form and compare it with the result of running the transform set for e-mail extraction 
running directly on the domain and see how the results are different. 

E.g.: Domain = paterva.com 


129 


Copyrighted material 


















130 CHAPTER 6 OSINT Tools and Techniques 



FIGURE 6.29 

Maltego Transform result (Domain to Email address). 


PERSON TO WEBSITE 

For this example we will be using a machine “Person - Email address.” Let’s take 
an entity of type person and assign it a value “Andrew MacPherson” and run the 
machine on this entity. The machine will start to enumerate associated e-mail IDs 
using different transforms. Once it has completed running one set of transforms it 
will provide us the option to move forward with selected entities, enumerated till 
now. From the above example we know “andrew@punks.co.za” is a valid e-mail 
address so we will go ahead with this specific entity only. What we get as an end 
result is websites where this specific e-mail address occurs, by running the trans¬ 
form “To Website [using Search Engine]” (as a part of the machine). 

The examples shown clearly demonstrate the power of this sophisticated tool. 
Running a series of transforms or a machine can enumerate a lot of data which 
can be very helpful during a pentest or a threat-modeling exercise. Extracting a 
specific type of data from another data type can be done in different ways (using 
different series of transforms). The best way to achieve what we want is to run a 
series of transforms, eliminate the data we don’t need, then parallely run another 
sequence of transforms to verify the data we have got. This exercise not only helps 
to verify the credibility of the data we have got but sometimes also produce unique 
revelation. 

Maltego even allows to save the graph we have generated into a single file in 
“mtgx” format for later usage or sharing. We can even import and export entities as 
well as configuration. This feature allows us to carry our custom environment with 
us and use it even on different machines. 


Copyrighted material 

















PA131 


Maltego 



FIGURE 6.30 

Saving Maltego results. 

Apart from the prebuilt transforms Maltego allows us to create our own trans¬ 
forms. This feature allows us to customize the tool to extract data from various other 
sources that we find useful for specific purpose, for example an API which allows to 
get the company name from its phone number. 

For custom transforms we have got two options: 

Local transforms: These transforms are stored locally in the machine on which the 
client is running. These type of transforms are very useful when we don’t need/want 
others to run the transform or execute a task locally. They are simple to create and 
deploy. Major drawback is that if we need to run it on multiple machines we need 
install them separately on each one of them, and same is the case for updates. 

TDS transforms: TDS stands for transform distribution server. It is a web application 
which allows the distribution as well as management of transforms. The client sim¬ 
ply probes the TDS, which calls the transform scripts and presents the data back to 
the client. Compared to local transforms they are easy to setup and update. 

We will learn how to create transforms in later chapter. 

So these are some of the tools which can play a very crucial part in an information¬ 
gathering exercise. Some of these are more focused on information security and some are 
generic. The main takeaway here is that there are a bunch of tools out there which can 
help us to extract relevant information within minutes and if used in a proper and efficient 
manner these tools can play a game changing role in our data extraction process. There 
is something for everyone, it’s just a matter of knowing how data is interconnected and 
hence how one tiny bit of information may lead to the box of Pandora. In the next chapter 
we will move forward and learn about the exciting world of metadata. We will deal with 
topics like what is metadata, how is it useful, how to extract it, etc. We will also deal with 
topics like how it can be used against us and how to prevent that from happening. 


131 


Copyrighted material 

















PA133 


CHAPTER 


Metadata 



INFORMATION IN THIS CHAPTER 

• Metadata 

• Impact 

• Metadata Extraction 

• Data Leakage Protection (DLP) 


INTRODUCTION 

In the last few chapters we have learned extensively about how to find information 
online. We learned about different platforms, different techniques to better utilize 
these platforms, and also tools which can automate the process of data extraction. In 
this chapter we will deal with a special kind of data, which is quite interesting but 
usually gets ignored, the metadata. 

Earlier metadata was a term mostly talked about in the field of information sci¬ 
ence domain only, but with the recent news circulation stating that National Security 
Agency has been snooping metadata related to phone records of its citizens, it is 
becoming a household name. Though still many people don’t understand exactly 
what metadata is and how it can be used against them, let alone how to safeguard 
themselves from an information security point of view. 

The very basic definition of metadata is that it’s “data about data,” but some¬ 
times it’s a bit confusing. So for the understanding purpose we can say that meta¬ 
data is something which describes the content somehow but is not the part of the 
content itself. For example in a video file the length of the video can be its metadata 
as it describes how long the video will play, but it is not the part of the video itself. 
Similarly for an image file, the make of the camera used to click that picture can be 
its metadata or the date when the picture is taken as it tells us something related to 
the picture, but is not actually the content of the picture. We all have encountered 
this kind of data related to different files at some point of time. Metadata can be 
anything, the name of the creator of the content, time of creation, reason of cre¬ 
ation, copyright information, etc. 

The creation of metadata actually started long ago in libraries, when people 
had information in the form of scrolls but no way to categorize them and find them 


Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-8018fi7-5.00007-0 133 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






134 CHAPTER 7 Metadata 


quickly when needed. Today in the digital age we still use metadata to categorize 
files, search them, interconnect them, and much more. Most of the files that reside in 
our computer systems have some kind of metadata. It is also one of the key compo¬ 
nents needed for the creation of the semantic web. 

Metadata is very helpful in managing and organizing files and hence is used 
extensively nowadays. Most of the times we don’t even make a distinction between 
the actual content and its metadata. It is usually added to the file by the underlying 
software which is used to create the file. For a picture it can be the camera that was 
used to click it, for a doc file it can be the operating system used, for an audio file 
it can be the recording device. Usually it is harmless as it does not reveal any data 
which can be sensitive from information security perspective, or is it? We will see 
soon in the following portion of this chapter. 

There are huge number of places where metadata is used, from the files in 
our systems to the websites on the internet. In this chapter we will mainly focus 
on extracting metadata from places which are critical from information security 
view point. 


METADATA EXTRACTION TOOLS 

Let’s discuss about some of the tool which can be used for the metadata extraction. 

JEFFREY’S EXIF VIEWER 

Exif (exchangeable image file format) is basically a standard used by devices which 
handle images and audio files, such as video recorder, smartphone cameras etc., It 
contains data like the image resolution, the camera used, color type, compression etc. 
Most of the smartphones today contain a camera, a GPS (global positioning system) 
device, and internet connectivity. In many of the smartphones when we click a pic¬ 
ture it automatically tracks our geolocation using the GPS device and embeds that 
information into the picture just clicked. We being active on social networks share 
these pictures with the whole world. 

Jeffrey’s Exif Viewer is an online application (http://regex.info/exif.cgi) 
which allows us to see this Exif data present in any image file. We can sim¬ 
ply upload it from our machine or provide the URL for the file. If an image con¬ 
tains the gcolocations, it will be presented in the form of coordinates. Exif 
Viewer is based on the Exif Tool by Phil Harvey, which can be downloaded from 
http://www.sno.phy.queensu.ca/~phil/exiftool/. It not only allows to read the Exif 
data but also write it to the files. Exif Tool supports a huge list of different formats 
like XMP, GFIF, ID3, etc., which are also listed on the page. 


Copyrighted material 





PA135 


<- 


e 


0 regex.info/exif.cgi 


Basic Image Information 

Target file: \\T_20140922_10_40_53_Pro.jpg 


Camera; 

Nokia Lumia 630 

Exposure: 

Auto exposure, V8 sec, £2.4, ISO 1600 

Flash: 

Off, Did not fire 

Date: 

September 22, 2014 10:40:53AM (timezone not specified) 

(12 hours, 4$ minutes, 17 seconds ago, assuming image timezone of 5V: hours ahead of GMT) 

Location: 

Latitude longitude: 28° 35’ 30.7" North, 77° 22' 17.5" East 
( 28.591S63,77.37153$ ) 


Location guessed from coordinates: 

E-88, E-block , Sector 52, New Okhla Industrial Development Area, Uttar 
Pradesh 201307, India 


Map via embedded coordinates at: Google, Yahoo, WildMapia, OpenStreetMap, Bing (also see 
the Google Maps pane below) 


Altitude: 176 meters (577 feet) 

Timezone guess from earthtools.org: 5 : : hours ahead of GMT 

File: 

916 x 1,632 JPEG (1.5 megapixels) 

Color 

Encoding: 

WARNING: Color space tagged as sRGB, without an embedded color profile. 
Windows and Mac browsers and apps treat the colors randomly. 


Images for the web are most widely viewable when in the sRGB color space and with 
an embedded color profile. See my Introduction to Digital-Image Color Spaces for 
more information. 


FIGURE 7.1 


Jeffrey's Exif Viewer 



^ C:\Users\(o.o)\Downloads\Compressed\exiftool(-k).exe 




Luminance 

0 80 0 

Measurement Observer 

ClE 1931 

Measurement Backing 

0 0 0 

Measurement Geometry 

Unknown 

Measurement Flare 

0y. 

Measurement Illuninant 

D65 

Media Black Point 

0.01205 0.0125 0.01031 

Red Matrix Column 

0.43607 0.22249 0.01392 

Red Tone Reproduction Curve 
act > 

<Binary data 2060 bytes, use -b option to extr 

Technology 

Cathode Ray Tube Display 

Uiewing Cond Desc 

Reference Uiewing Condition in IEC 61966-2-1 

Media White Point 

0.9642 1 0.82491 

Profile Copyright 

Copyright International Color Consortium, 2009 

Chromatic Adaptation 

7 -0.00925 0.01506 0.75179 

: 1.04791 0.02293 -0.0502 0.0296 0.99046 -0.0170 

Image Width 

2048 

fmage Height 

1152 

Encoding Process 

Progressive DCT, Huffman coding 

Bits Per Sample 

8 

Color Components 

3 

V Cb Cr Sub Sampling 

VCbCr4:2:0 <2 2> 

Image Size 
— press any key — 

2048x1152 


FIGURE 7.2 

Exit Tool interface. 


Copyrighted material 









































136 CHAPTER 7 Metadata 


Using the geolocation in the images we share, anyone can easily track where we 
were exactly at the time of clicking it. This can be misused by people with ill inten¬ 
tions or stalkers. So we should be careful if we want to just share our pictures or 
locations too. 

EXIF SEARCH 

We just discussed about the Exif and its power to geolocate the content. There is a 
dedicated search engine which allows us to search through geotagged images, it’s 
called Exif Search (http://www.exif-search.com/). 

This search engine provides data about the images and pictures from all 
over the internet. It contains a huge number of searchable Exif images from 
different mobile devices. Being totally different from traditional image search 
engines, which tend to just provide us the image as a result, Exif also provides 
the metadata. 

When we search in Exif Search, it searches the image and its information in 
its own database and provides us the result. Currently it has more than 100million 
images with metadata and it’s constantly updating its database. 

This search engine provides user the freedom to search an image based on 
location, date, and device type. It also allows us to sort the data based on these 
date location or device type. Another unique feature of this search engine is that it 
allows us to force the search engine to fetch us result for only images that contains 
GPS data. There is a small check box available just below the search bar which 
does the work for us. 


<• __ _ _ c a- G oog tt _ p o ♦ # = 

d Mmt Viutnt (Mtting St.rtr.l Xuqijrstfd Xrfn Wtti SJk. GjAtty 



HOME I ABOUT US I SERVICES I DEVICE MAP 


exif Photo Search 

Search Photos on the Internet by location, date, and device type 

_ 3 ','"’ 13 

J •.■■■.;* UPS j 


Recently Found Photos: (104,017,893) 





& W *1 m ^ 


FIGURE 7.3 

Exif-search.com interface. 


Copyrighted material 



























PA 137 


Metadata extraction tools 


It also supports a huge number of devices. The list can be found 
http://www.exif-search.com/devices.php, some of them are Canon, Nikon, Apple 
and Fujifilm etc. 


<■ i i www.exif-search.com/index.php?q=niagra+falls&SD=2013-09-01&ED=2014-09*17&x=40&y=22&geo=l#.VCBQjlcwCzE 

Most Visited • Getting Started [..} Suggested Sites • Web Slice Gallery 



Date: 

Location: 

Device: 

Details: 


Big Oak Flat Rood. Yosemite.NationalPark. YOSEMITE 

NATIONALPARK CA. USA 

MKON.CORTO^TON NWON D700 

Drenched Last year Wilhe, Will and I got our first great 
moonbow photo while on top of the Upper Yosemite Falls 
trail Thanks to some professors in Texas just about anyone 
can find out when the moon bows in Yosemite will occur. 
Trying to avoid the hoards of crowds at the Sentinal Bndge 
parking lot we decided to try to find a more unique 
moonbow and from Y e * r Willie and I 

had seen a number of timescape videos, most notably Steve 
Bumgardner's official video for the Yosemite Conservancy, in 
which moonbows were photographed at Cascade Falls We 
knew we had to try this! ..I spent a lot of time trying to figure 
out how to get to the proper location to shoot a moonbow at 
Cascade Falls. You need to get high enough and east enough 
to get around a jut in the rocks (you can see it here, where 
the water flows over, blocking the top of the falls) to get the 
proper angle to see the top of the falls, which has a really nice 
‘S' curve to it. I used Google Earth and a number of other 
peoples images to get a vague idea of what we had to do. We 
found out that Steve traveled up from the bottom (along 
highway CA-140) but 1 thought you might be able to drop in 
from the top. When I amved at Yosemite on Saturday I quickly 
ruled out the top-down approach I hopped in the car. drove 
down to the bottom and started on up. After an hour of 
completely sweating, super steep climbing, and searching high 


FIGURE 7.4 

Exif-search.com sample search result. 


ivMeta 

Similar to images, video files can also contain GPS coordinates in their metadata. 
ivMeta is a tool created by Robin Wood (http://digi.nmja/projects/ivmeta.php) which 
allows us to extract data such as software version, date, GPS coordinates, model 
number from iPhone videos. iPhone is one of the most popular smartphone available 
and has a huge fan base. With more than a million users, their activity to show the 
uniqueness of the iPhone standard makes them more vulnerable to metadata extrac¬ 
tion. No doubt on the camera quality of the devices and the unique apps to make the 
pictures and videos look more trendy, iPhone users upload lots of such data content 
everyday in different social networking sites. Though there is an option available on 
the device to deactivate geotagging, the by-default setting and the use of GPS allows 
to create metadata about any image or video taken. In this case this tool comes handy 


137 


Copyrighted material 


















Metadata extraction tools 139 


extraction to another level by allowing all the Microsoft portable executables. It also 
supports torrent files, which are the easy solution to most of the data sharing require¬ 
ments. So torrent metadata extraction is definitely one of its unique feature. Who even 
would thought of extracting metadata from ttf or true type fonts, but yes this tool also 
supports ttf format. There are many other formats it supports, we can get the details 
from the following url: https://bitbucket.org/haypo/hachoir/wiki/hachoir-metadata. 

This hachoir-metadata is basically a command-line tool, and by default it’s very 
verbose. That means running the same without any switches, it provides lots of information. 

# hachoir-metadata xyz.png 

We can also run this tool with multiple and different file formats at a time to get 
the desired result. 

# hachoir-metadata xyz.png abc.mp3 ppp.flv 

When we need only mime details we can use 

# hachoir-metadata —mime xyz.png abc.mp3 ppp.flv 

When we need little more information other than mime we can use -type switch 

# hachoir-metadata —type xyz.png abc.mp3 ppp.flv 

for exploring the tool for other options we can use 

# hachoir-metadata —help 


FOCA 

On a daily basis we work with a huge number of files such as DOC, PPT, PDF, etc. 
Sometimes we create them, sometimes edit, and sometimes just read through. Apart 
from the data we type into these files, metadata is also added to them. To a normal 
user this data might seem harmless, but actually it can reveal a lot of sensitive infor¬ 
mation about the system used to create it. 

Most of the organizations today have online presence in the form of websites 
and social profiles. Apart from the web pages, organizations also use different files 
to share information with general public and these files may contain this metadata. 
In Chapter 5 we discussed how we can utilize search engines to find the files that are 
listed on a websites (E.g. In Google: “site:xyzorg.com filetype:pdf”). So once we 
have listed all these files, we simply need to download them and use a tool which can 
extract metadata from them. 

FOCA is a tool which does this complete process for us. Though FOCA means seal 
in Spanish, the tool stands for ‘Fingerprinting Organizations with Collected Archives’. 
It can be downloaded from https://www.elevenpaths.com/labstools/foca/index.html. 
After downloading the zip file, simply extract it and execute the application file 
inside the bin folder. 

To use FOCA we simply need to create a new project, provide it with a name and 
the domain to scan. Once this is saved as a project file, FOCA allows us to choose 


Copyrighted material 



140 CHAPTER 7 Metadata 


the search engines and the file extensions that we need to search for. After that we 
can simply start by clicking on the button “Search All.” Once we click on this button 
FOCA will start a search for the ticked file types on the mentioned domain, using 
different search engines. Once this search is complete it will display the list of all the 
documents found, their type, URL, size, etc. 

Now we have the list of the documents present on the domain. Next thing we need 
to do is download the file(s) by right clicking on any one and choosing the option 
Download/Download All. Once the download is complete the file(s) is/are ready for 
inspection. So now we need to right click on the file(s) and click on the Extract Meta¬ 
data option. Once this is complete we can see that under the option Metadata at the 
right-hand side bar FOCA has listed all the information extracted from the document(s). 

This information might contain the username of the system used to create the 
file, the exact version of the software application used to create it, system path, and 
much more which can be very helpful for an attacker. Though metadata extraction is 
not the only functionality provided by FOCA, we can also use to it to identify vul¬ 
nerabilities, perform network analysis, backups search and much more information 
gathering, the most prevalent functionality. 



FIGURE 7.6 

FOCA result. 

METAG00FIL 

Similar to FOCA, Metagoofil is yet another tool to extract metadata from documents 
which are available online. Metagoofil is basically a Python based command line too. 


Copyrighted material 






















































PA141 


Metadata extraction tools 


The tool can be downloaded from https://code.google.eom/p/metagoofil/downloads/ 
list. Using this tool is fairly easy; there are a few simple switches that can be used to 
perform the task. 

The list of the options is as following: 


Metagoofil options 
-d: domain to search 

-t: filetype to download (pdf, doc, xls, ppt, odp, ods, doex, xlsx, pptx) 
-1: limit of results to search (default 200) 

-h: work with documents in directory (use “yes” for local analysis) 

-n: limit of files to download 

-o: working directory (location to save downloaded files) 

-f: output file 


We can provide the queries such as the one mentioned below to run a scan on 
target domain and get the result in the form of a HTML file, which can be easily read 
in any browser: 

metagoofil -d example.com -t doc,pdf -1 100 -n 7 -o /root/Desktop/meta -f /root/ 
Desktop/meta/result.html 


root@kali: ~ 


File Edit View Search Terminal Help 


metagoofil 

* *.*4.****.* *******4.**** *****4.4.4.*********************** 

/\/\ _ || ___ _ / _(_) | 

/ \ / _ \ _/ 1/ 1/ _ \ / _ M LI I I 

/ /\/\ \ / II (_l I U I <J I (J I _lll 

V \A_|\_\_,_|\_, |\_/ \_/|J LLI 

I_/ 

Metagoofil Ver 2.2 

Christian Martorella 

Edge-Sec urity.com 

c ma rt o rella_at edge-sec urity.com 


Usage: metagoofil options 


•d: domain to search 

-t: filetype to download (pdf,doc,xls,ppt,odp,ods,doex,xlsx,pptx) 
-l: limit of results to search (default 200) 


-h: work with documents in directory (use "yes“ for local analysis 
-n: limit of files to download 

-o: working directory (location to save downloaded files) 

-f: output file 


mm H0C3GDH 


Examples: 

metagoofil.py -d apple.com -t doc,pdf -l 200 -n 50 -o applefiles -f results.ht 
ml 

metagoofil.py -h yes -o applefiles -f results.html (local dir analysis) 


:-# | 


FIGURE 7.7 

Metagoofil interface. 


Similar to FOCA, Metagoofil also performs search for documents using search 
engine and downloads them locally to perform metadata extraction using various 


141 


Copyrighted material 













PA 142 


142 CHAPTER 7 Metadata 


Python libraries. Once the extraction process is complete the results are simply dis¬ 
played in the console. As mentioned above these results can also be saved as a HTML 
file for future reference using the -f switch. 


[( iMetagoofil results 


§ filey//root/Oesktop/OataMine/results.html 


▼ C [a* Google 


flBackTrack Linux fjoffensive Security QExploit-DB ^Aircrack-ng gjSomaFM 

Results fon amitj.edu/placeiiient 

50% 


40% 

11 ... 


Unnumei Software Ema4j PafiVSefver* 

User names found: 


• amitp 

• ankurt 

• siddharth 

• fiyanjanikb 

Software versions found: 

• Microsoft Office Word 

• Acrobat Distiller 5.0.5 (Windows) 

• PScript5.dll Version 5.2 

• JiyOpcnOfricc.org 3.1 

• J>y Writer 

FIGURE 7.8 

Metagoofil result. 


Similarly there are other tools which can be used for metadata extraction from various different 
files, some of these arc listed below: 

• Mcdialnfo—audio and video files (http://mediaarea.net/en/McdiaInfo) 

• Gspot—video files (http://gspot.headbands.com/) 

• Videoinspector—video files (http://www.kcsoftwares.eom/7vtb#help) 

• SWF Investigator—SWF/flash files http://labs.adobe.com/downloads/swfinvestigator.html) 

• Audacity—audio files (http://audacity.sourceforge.net/) 


IMPACT 

The information collected using metadata extraction can be handy and used to craft 
many different attacks on the victim by stalkers, people with wrong motivations 
and even government organizations. The real-life scenario can be worse than what 


Copyrighted material 


















PA 143 


Impact 


we can expect. As information collected from the above process provide victims’ 
device details, area of interest, and sometime geolocation also, the information 
such as username, software used, operating system etc. is also very critical for an 
attacker. This information can be used against the victim using simple methods 
such as social engineering or to exploit any device-specific vulnerability that harms 
the victim personally in real life as it also provides exact location where the victim 
generally spends time. 

And all those things are possible just because of some data that mostly nobody 
cares or some might not even realize its existence, even if they do, then also most of 
them are not aware where this data can lead to and how it makes their real as well as 
virtual life vulnerable. 

As we have seen that how much critical information is revealed through the 
documents and files uploaded without us realizing it and what are possibilities of 
turning this data as critical information against a victim and use them as an attack 
vector. Now there must be a way to stop this, and it’s called as data leakage protec¬ 
tion (DLP). 

SEARCH DIGGITY 

In the last chapter we learned a about advanced search features of this interesting 
tool. For a quick review Search Diggity is tool by Bishop Fox which has a huge set 
of options and a large database of queries for various search engines which allow us 
to gather compromising information related to our target. But in this chapter we arc 
most interested on one of the specific tab of this tool and that is DLP. 

There are wide numbers of options to choose from side bar of DLP tab in search 
Diggity. Some of the options are credit card, bank account number, passwords, 
sensitive files, etc. 

This DLP tab generally is a dependent one. We cannot directly use this. First we 
have to run some search queries on a domain of our interest then select and down¬ 
load all the files those are found after completion of that search query than provide 
the path in DLP tab to check whether any sensitive data is exposed to public for that 
particular domain or not. To do so we can choose either Google tab or Bing tab which 
means either Google search engine or Bing and in that have to select “DLPDiggity 
initial” option to start searching for backup, config files, financial details, database 
details, logs and other files such as text or word document, and many more from 
that domain of our interest. Though there is a option to only choose some specific 
suboptions from “DLPDiggity initial” option, from demo prospective let’s search 
for all the suboptions. After completion of the query we will get all the available 
files in tabular format in a result section of this tool. Select all the files that we got 
and download the same. It will save all the files in default path and in a folder called 
DiggityDownloads. 


143 


Copyrighted material 


PA 145 


Metadata removal/DLP tools 


The result sometimes might show scary results such as credit card numbers, bunch of 
passwords, etc. That is the power of this tool. But our main focus is not about discovery 
of sensitive files but DLP. So get all the details from the tool’s final result. The result 
shows in an easy and understandable manner, in what page or document what data is 
available. So that the domain owner can remove or encrypt the same to avoid data loss. 


METADATA REMOVAL/DLP TOOLS 

As DLP is an important method to avoid data loss. The above example is quite generic 
to get us some idea about how DLP works. Now as per our topic we are more interested 
on metadata removal. So there are also different tools available to remove metadata 
or we can also say them as metadata DLP tools. Some of those are mentioned below. 


METASHIELD PROTECTOR 

MetaShield Protector is a solution which helps to prevent data loss through office 
documents published on the website. It is installed and integrated at web server level 
of the website. The only limitation of this is that, it is only available for IIS web 
server. Other than that It supports a wide range of office documents. Some of the pop¬ 
ular file types are ppt, doc, xls, pptx, doex, xlsx, jpeg, pdf, etc. On a request for any of 
these document types, it cleans it on the fly and then delivers it. MetaShield Protec¬ 
tor can be found at https://www.elevenpaths.com/services/html_en/metashield.html. 
The tool is available at https://www.elevenpaths.com/labstools/emetrules/index.htmL 

MAT 

MAT or metadata anonymization toolkit is a graphical user interface tool which also 
helps to remove metadata from different types of files. It is developed in Python and 
utilizes hachoir library for the purpose. As earlier we discussed a bit about hachoir 
Python library and one of its project in hachoir-metadata portion, this is another 
project based on the same library. The details regarding the same can be found here 
http s: //mat .bourn .org/. 

The best thing about MAT is that it is open source and supports a wide range of 
file extensions such as png, jpeg, doex, pptx, xlsx, pdf, tar, mp3, torrent etc. 


MyDLP 

It is a product by Comodo which also provides wide range of security product and 
services. MyDLP is an one stop solution for different potential data leak areas. In 
an organization not only documents but also emails, USB devices, and other similar 
devices are potential source of data leak. And in this case it allows an organization to 
easily deploy and configure this solution to monitor, inspect, and prevent all the out¬ 
going critical data. The details of MyDLP can be found here, http://www.mydlp.com. 


145 


Copyrighted material 





PA 147 



Online Anonymity 


8 


INFORMATION IN THIS CHAPTER 

• Anonymity 

• Online anonymity 

• Proxy 

• Virtual private network 

• Anonymous network 


ANONYMITY 


Anonymity, the basic definition of this term is “being without a name.” Simply 
understood someone is anonymous if his/her identity is not known. Psychologically 
speaking, being anonymous may be perceived as a reduction in the accountability 
for the actions performed by the person. Anonymity is also associated with privacy 
as sometimes it is desirable not to have a direct link with a specific entity, though 
sometimes it is required by law to present an identity before and/or during an action 
is performed. In the physical world we have different forms of identification, such 
as Social Security Number (SSN), driving license, passport etc., which are widely 
acceptable. 


ONLINE ANONYMITY 


In the virtual space we do not have any concrete form of ID verification system. We 
usually use pseudonyms to make a statement. These pseudonyms arc usually are 
not related to our actual identity and hence provide a sense of anonymity. But the 
anonymity present on the internet is not complete. Online we may not be identified 
by our name, SSN, or passport number, but we do reveal our external IP address. 

This IP address can be used to track back to the computer used. Also on some plat¬ 
forms like social network websites we create a virtual identification as they relate to 
our relationships in physical world. Some websites have also started to ask users to 
present some form of identification or information which can be related directly to a 
person, in the name of security. So basically we are not completely anonymous in the 

Hacking Web Intelligence. httpV/dx.dol.org/10.1016/B978-0-12-801867-5.00008-2 147 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 







148 CHAPTER 8 Online Anonymity 


cyber space. Usually we do reveal some information which might be used to trace the 
machine and/or the person. 


WHY DO WE NEED TO BE ANONYMOUS 

There are many reasons to be anonymous. Different people have different reasons 
for that some may want to be anonymous due to their work demands such as those 
who are into cyber investigation, journalism, and some might want to be anonymous 
because of their concern of their privacy etc. There are times when we want to protest 
on something good but doing that openly might create some problems so we want to 
be anonymous. As we say in physical life, people who do bad things like a criminal 
after doing a crime want to go underground the same way in virtual life or in the 
internet. Cyber-criminals and hackers wanted to be anonymous. 

Being anonymous is just a choice. It does not always need a reason. It’s just a 
state to be in virtual life. It’s a virtual lifestyle and while some want to enjoy the same 
and others might be forced to be. Similar to the physical world we do have a need or 
desire to stay anonymous on the internet. It may just be that we are concerned about 
our privacy, we want to make a statement but won’t do it with our true identity, we 
need to report something to someone without getting directly involved, communi¬ 
cate sensitive information, or simply want to be a stranger to strangers (anonymous 
forums, chat rooms etc.). Apart from the mentioned reason, we may simply want 
to bypass a restriction put up by the authority (c.g., college Wi-Fi) to visit certain 
portions of the web. The motivation behind it can be anything, but a requirement is 
surely there. 

People might get confused of being anonymous that means just hiding the iden¬ 
tity. It can also about hiding what you are doing and what you want to be. A simple 
example can help us to understand this. Let’s say we wanted to buy something and 
we visited an e-commerce site to buy it. We liked the product but due to some reasons 
we did not buy that. But as we were surfing normally, we may found advertisement of 
the same product all over the internet. It’s just a marketing policy for the e-commerce 
giants by tracking a user’s cookies to understand his/her likes and dislikes and post 
the advertisement according to that. 

Some might like this and some might not. It’s not just about somebody is 
monitoring on what are you doing in the internet but also about flooding adds 
about similar things to lure us to buy. To avoid such scenarios also people might 
prefer to browse anonymous. For a quick revision, there are private browsing 
options available in most of the browsers and there are specific anonymous brows¬ 
ers available that do this work for us. 

In this chapter we will deal with different ways to stay anonymous online. 100% 
anonymity cannot be guaranteed on the internet, still with the tools and techniques 
that will be mentioned in this chapter, we can hide our identity up to a reasonable 
level. 


Copyrighted material 


PA 149 


Ways to be anonymous 


WAYS TO BE ANONYMOUS 

There are many ways to be anonymous and there are many aspects of being anon¬ 
ymous. Some might focus on the personal details to be hidden such as in social 
networking sites by using aliases, generic information or fake information, generic 
e-mail id, and other details. Some might want to be anonymous while browsing so 
that nobody can track what resource they are looking into. Some might want to hide 
their virtual identity address such as IP address etc. 

There are different ways to achieve the above conditions. But the major and popu¬ 
lar solutions available are either proxy or virtual private network (VPN). Though 
there arc other methods to be anonymous but still these two arc widely used and we 
will focus on these majorly in this chapter. 


PROXY 

Proxy is a word generally used for doing stuffs on behalf of someone or something. 
Similarly in technology, proxy can be treated as an intermediate solution that for¬ 
wards the request sent by the source to the destination and collects response from the 
destination and sends it to the source again. 

It is one of the widely used solutions used for anonymity. The only reason to use 
proxy is to hide the IP address. There are different proxy solutions available such as 
web proxy, proxy software etc. Basically all the solutions work on a basic principle 
to redirect traffic to the destination from some other IP address. The process might 
differ from solution to solution but the bottom line remains the same. 

Though proxy can be used for many other purposes just apart from being anony¬ 
mous, we will focus only the anonymity as the chapter demands the same. 

Before focusing into the very deep technical aspects of proxy let’s look into some 
work around to be anonymous. As in earlier chapters we learned how to use search 
engines efficiently and power searching. Now it’s time to look into how a search 
engine can be used as a proxy to provide anonymity. 

As Google is a popular search engine it can also be used as proxy with its feature 
called as Google Translate. Google provides its services in many countries apart 
from the English speaking ones and it also supports multiple languages. The Google 
Translate option allows a user to read web content in any other language a user 
wants. For a generic example, a non-English content can be translated to English 
and vice versa. So this feature allows a user to use Google server to forward the 
request and collect the response on his/her behalf, which is the basic fundamental 
of a proxy. 

Now for testing the same, first we will look into our own IP address using a site 
called http://whatismyipaddress.com/ and later use Google translator to check the 
same site. The work of this site is to tell the IP address used to send the request to 
the site. If for the normal browsing and browsing through Google Translate the IP 
address differs, it means we achieved anonymity using Google Translate. 


149 


Copyrighted material 





PA151 


Ways to be anonymous 


«■ © aF 


• googleusercontent.com- ^ : M-. - ; ; 


v=t8trurl=translat< 




[Pj. Most Visited ^ Getting Started Suggested Sites j Web Slice Gallery 




ft)- Trace Email 
Mn| Track down the etna! 

^ Geographical location and origin of 
year you received 

^ Hide IP 

J Learn how to use a high-tech 

"middleman” to shield your real 
P address on the Internet. 


^ \f ~ VPN Comparison 

l n . Comoare ton rated nrov 


1 n . Compare top rated providers VPN 
t] Service That Needs and meet your 
budget 

Blacklist Check 
) Have you been biackksted 

Because of the P address you 
use? Check to see here. 

Speed Test 

Is your Internet connection up to 
W6pi speed’Fnd out for free with a 
quick c&ck. 


Your IP Address Is: 


66 . 249 . 82.121 


Your IP Details: 


ISP: Google 

Services: Confirmed Proxv Server 

City:Deoghar 

Region: Jharkhand 

Country: India 


Do not want this Known? Hide vour IP details 

i— -«-i7ftft.cn - ? s — 



Click for 
^ '-V.-o "more details 

<T 


Varafta%i ew Bh»#*>ur ° 

® G&rt © . * * ^ ^ • 

9 Jc^***- 

■ Bangladeshi 


•F» mapquest 

crt*niSOW 


«r\ • 



@2014 MapQucrt Some dm @2014 Natural Earth 


FIGURE 8.3 

Page opened inside Google Translate. 


We can see from the above image that the IP addresses of direct browsing and of 
browsing using Google Translate are different. Thus it is proved that we can use Google 
Translate as proxy server to serve our purpose. In many cases it will work fine. Though it’s 
just a work around it’s very simple and effective. In terms of full anonymity it might not 
be helpful but still we may use this method where we need a quick anonymity solution. 

PROXY IN TERMS OF ANONYMITY 

As we came across one example where we can use search engine feature as proxy. 
But the point to be considered is anonymity. There are different levels of anonymity 
based on different proxy solutions. Some proxies just hide our details but keeping the 
same in their logs, and sometime some proxies can be detected as proxy by the server 
and some might not. That’s not the best solution if you want full anonymity. There 
are some solutions available which cannot be detected as proxy by the destination 
server and also delete all the user details the time user ends the session. Those are the 
best solutions for full anonymity. It all depends on our requirement to choose what 
service or what kind of proxy we want to use because fully anonymous proxy might 
charge the user some amount to use the solution. 

TYPES OF PROXY SOLUTIONS 

Now there are different types of proxy solutions available some are based on ano¬ 
nymity and also based on its type such as whether application-based or web-based. 
So let’s start exploring some of the available options in application-based proxy. 


151 


Copyrighted material 























PA153 


Ways to be anonymous 


■ 9 - UltraSurf 14.04 


l°i Ma- T 


m 

Reset 

Option 

Home 

UltraSurf 

Preferred 



r 

@ 100. os 


w 

O 100.0S 


r 

@ 100 . os 


Help 


x 

Exit 


Speed 


Listen: 127.0.0.1:9666 

Status: Successfully connected to server! 


Feedback 


FIGURE 8.4 


UltraSurf interface. 


A small drawback about this tool is that this tool supports only Windows. And 
another drawback is that the IP-checking solutions detect it as proxy server. But 
as we discussed earlier, this can be used in different other conditions based on our 
requirements and it’s easy to use. Just download, run, and browse anonymously. 

JonDo 

JonDo previously known as JAP is a proxy tool available at https://anonymous- 
proxy-servers.net/en/jondo.html. 

It is available for wide range of operating systems such as Windows, Mac, for dif¬ 
ferent flavors of Linux, and also for Android mobile. The full-fledged documentation 
of how to install and use makes it very essential as a proxy solution. Different proxy 
solutions come up with different types. It also provides one of its type for Firefox 
anonymous browsing known as JonDoFox. 

Before exploring JonDo let’s first look into the Firefox anonymous browsing 
solution i.e., JonDoFox. It can be found at https://anonymous-proxy-servers.net/ 
en/jondofox.html. 

As JonDo, JonDoFox is also available for different operating systems such as 
Windows, Mac, and Linux. User can download as per his/her operating system from 
the above URL. The documentation of how to install is also available just next to the 
download link. But let’s download and install while we discuss more about the same. 

Windows users will get JonDoFox.paf after downloading. After installing the 
same it will create a Firefox profile in name of JonDoFox. If user selects the same, 
the profile consists of many Firefox addons such as cookie manager, adblocker, etc., 
which will come to act. But to use it for full anonymity user needs to install certain 
dependent softwares such as Tor etc. 


153 


Copyrighted material 



















154 CHAPTER 8 Online Anonymity 


It’s good to use JonDoFox but user has to install all the dependent softwares once 
after installing the same. Some might not love to do so but still this is a great solution 
to browse anonymously. 

Like JonDoFox, JonDo can also be downloaded from the above URL. It will give 
you the installer. Windows user will get an exe file “JonDoSctuup.paf” after down¬ 
loading. The installation can be done for the operating system we are using and also 
for the portable version that can be taken away using the USB drive. User needs to 
choose according to his/her requirements. The only dependency of this software is 
JAVA. But as earlier we discussed how to install the same we are not going to touch 
that here again and by the way while installing this software it also installs JAVA, if 
it won’t find the compatible version available in the operating system. Once JonDo is 
installed, we can double click on its desktop icon to open the same. By default after 
installation it creates a desktop icon and enables it to start in Windows startup. 

JonDo only provides full anonymity and fast connection to premium users. But 
we still can use the same. But first time we need to activate it with its free code. Test 
coupon can be found at https://shop.anonymous-proxy-servers.net/bin/testcoupon? 
lang=en but we need to provide our e-mail address to get it. 



FIGURE 8.5 

JonDo interface. 

After providing the e-mail address we will get a link in our e-mail id. Visit the 
link to get the free code. Once we get the free code, put it in the software to complete 
the installation process. 


Copyrighted material 


















































158 CHAPTER 8 Online Anonymity 


As we discussed the pros and cons of this service still it’s very good proxy solu¬ 
tion for anonymous browsing and there are some other features like send e-mail and 
check e-ncws available. But as we are more focused on hiding our details on brows¬ 
ing, right now we will conclude this here itself. 

Zend2 

It is also a web-based proxy solution unlike anonymouse.org, which only supports 
http protocol. So user cannot use anonymouse.org to browse popular sites such as 
Facebook and YouTube as these sites force to use https connection. 

https://www.zend2.com/ has no restrictions on https-enabled sites or technically 
SSL-enabled sites. It allows user to surf both http and https sites. So user can use the 
same to check his/her e-mails also. 


4" fl^ttps^^vA^endiconJ ~ C 0 • i«nd2 

Most Visited 4ft Getting Suited Suggested Sites Web Slice Gallery 



frotatcjoin of S^><*<*cdb> 


r-1 

HOME ABOUT US PRIVACY POLICY CONTACT U 





> 

Free Membership 
Insight Survey 
Report 

$$##4 jd 

(fiJiic! Apricot ^G 



SURF > 


(Options) 


[3 Encrypt URL □ Encrypt Page 13 Allow Cookies 

|?1 Remove Scripts (7) Remove Objects 

FIGURE 8.11 

Zend2 homepage. 

Apart from that for two popular web resources such as Facebook and YouTube, it 
also provides special GUI to use. For Facebook: https://zend2.com/faccbook-proxy/. 
For YouTube: https://zend2.com/youtube-proxy/. The YouTube proxy page contains 
instructions how to unblock YouTube if it’s blocked in your school, college, office, 
or by the ISP while the Facebook proxy page contains general information how this 
web proxy works. 


Copyrighted material 

























PA 162 



FIGURE 8.14 

CyberGhost interface. 

The interface of the application is pretty simple. We can make the configuration 
changes and also upgrade to a paid account from it. On the home screen the appli¬ 
cation will display our current IP address with the location in map. To start using 
the service we simply need to click on the power button icon. Once we click on it 
CyberGhost will initiate a connection to one of the servers and will display a new 
location once the connection is made. 



FIGURE 8.15 

CyberGhost in action. 

In the settings menu of CyberGhost we can also make changes such as Privacy 
Control and Proxy which further allows us to hide our identity while connected online. 

Copyrighted material 





































PA 163 


Ways to be anonymous 


Hideman 

Similar to CyberGhost, Hideman is another application which allows us to conceal 
our identity. The client for the application can be downloaded from https://www.hi 
deman.net/. Like CyberGhost, in Hideman also we don’t need to make much con¬ 
figuration changes before using it, simply install the application and we are good to 
go. Once the application is installed, it provides a small graphical interface, which 
displays our IP and location. Below that there is an option where we can choose the 
country of connection, which is set to “Automatically” by default. Once this is done 
we simply need to click on the Connect button and the connection will be initiated. 
Currently Hideman provides free usage for 5 hours a week. 



FIGURE 8.16 

Hideman interface. 


163 


Copyrighted material 


























164 CHAPTER 8 Online Anonymity 


Apart from the mentioned services there are also many other ways to utilize VPN 
for anonymity. Some service providers provide VPN credentials which can be con¬ 
figured into any VPN client and can be used, others provide their own client as well 
as the credentials. 


ANONYMOUS NETWORKS 

An anonymous network is a bit different in the way it operates. In this the traffic is 
routed through a number of different users who have created a network of their own 
inside the internet. Usually the users of the network are the participants and they help 
each other to relay the traffic. The network is built in a way that the source and the 
destination never communicate directly to each other, but the communication is done 
in multiple hops through the participating nodes and hence anonymity is achieved. 

The Onion Router 

Tor stands for “The Onion Router.” It is one of most popular and widely used meth¬ 
ods to stay anonymous online. It is basically a software and an open network which 
allows its users to access the web anonymously. It started as a US navy research 
project and now is run by a nonprofit organization. The user simply needs to down¬ 
load and install the Tor application and start it. The application starts a local SOCKS 
proxy which then connects to the Tor network. 

Tor uses layered encryption over bidirectional tunnels. What this means is that 
once the user is connected to the Tor network, he/she sends out the data packet with 
three layers of encryption (default configuration) to the entry node of the Tor net¬ 
work. Now this node removes the uppermost layer of the encryption as it has the key 
for that only but the data packet is still encrypted, so this node knows the sender but 
not the data. Now the data packet moves to second node which similarly removes 
the current uppermost encryption layer as it has the key for that only, but this node 
does not know the data as well as the original sender. The packet further moves to 
the next node of the Tor network, which removes the last encryption layer using the 
key which works for that layer only. Now this last node, also called the exit node has 
the data packet in its raw form (no encryption) so it knows what the data is, but it is 
not aware who the actual sender of the data is. This raw data packet is then further 
sent to public internet to the desired receiver, without revealing the original sender. 
As already stated this is bidirectional so the sender can also receive the response in 
similar fashion. One thing that needs to be mentioned here is that the nodes of the Tor 
network between which the data packet hops are choosen randomly, once the user 
wants to access another site, the Tor client will choose another random path between 
the nodes in the Tor network. This complete process is termed as onion routing. 

So Tor is pretty good at what it does and we just learned how it works. But as we 
need to use different nodes (relay points) and there is also cryptographic functions 
involved, which makes it pretty slow. Apart from this we are also trusting the exit 
nodes with the data (they can see the raw packet). 

Tor is available in many different forms, as a browser bundle, as a complete 
OS package etc. The browser bundle is the recommended one as it is completely 


Copyrighted material 


Ways to be anonymous 165 


preconfigured, very easy to use, and comes with additional settings which helps to 
keep the user safe and anonymous. The browser bundle is basically a portable Firefox 
browser with Tor configured. It also contains some additional addons such as HTTPS 
Everywhere, NoScript. Tor browser can be downloaded from https://www.torproj 
ect.org/download/download-easy.html.en. Once it is downloaded we simply need 
to execute the exe file and it will extract it in the mentioned directory. After this 
has been completed we simply need to execute the “Start Tor Browser” application, 
which is a portable Firefox browser with Tor configured. It will present us with the 
choice to connect directly to the Tor network or configure it before going forward. 
General users simply need to click on the Connect button, in case the network we are 
connected to requires proxy or other advanced settings, we can click on the Config¬ 
ure button to make these settings first. Once we are good to go, we can connect to the 
network and the Tor browser will open up as soon as the connection is made. Apart 
from this other packages which allow us to run bridge, relay, and exit nodes can be 
downloaded from https://www.torproject.org/download/download.html.en. 



FIGURE 8.17 

Tor Browser. 

Apart from allowing users to surf the web anonymously, Tor also provides another 
interesting service, about which we will learn in next chapter. 

Invisible Internet Project 

I2P stands for Invisible Internet Project. Similar to Tor, I2P is also an anonymous 
network. Like any network there are multiple nodes in this network, which are used 
to pass the data packets. As opposed to Tor, I2P is more focused on internal services. 


Copyrighted material 























Ways to be anonymous 167 


Similar to Tor, I2P also provides other services which we will discuss in next chapter. 

Browser addons like FoxyProxy (http://getfoxyproxy.org/) can be used to make 
the proxy changes easily in the browser. 

The individual techniques we have discussed in this chapter can also be chained 
together to make it more difficult to get traced. For example, we can connect to a 
VPN-based proxy server, further configure it to connect to another proxy server in 
another country, and then use a web-based proxy to access a website. In this case the 
web server will get the IP address of the web-based proxy server used to connect to 
it, and it will get the IP address of the proxy server we connected through the VPN; 
we can also increase the length of this chain by connecting one proxy to another. 
There is also a technique called proxy bouncing or hopping in which the user keeps 
on jumping from one proxy to another using an automated tool or custom script with 
a list of proxies, in this way the user keeps on changing his/her identity after a short 
period of time and hence makes it very difficult to be traced. This can also be imple¬ 
mented at server side. 


Some scenarios in which people still get caught after using these tools/techniques: 

• The user of a specific network (c.g., University) is known, and it is also known that which one 
of them was connected to a specific proxy server/Tor around a specific time. 

• Rogue entry and exit points. In an anonymous network like Tor if the entry point and the exit 
point can correlate the data packet based on its size or some other signature, they can identify 
who the real sender might be. 

• DNS leak. Sometimes even when we are connected to an anonymous network our machines 
might send out the DNS requests to the default DNS server instead of the DNS server of the 
anonymous network. It means that the default DNS server now may have a log that this specific 
address resolution was requested by this IP at this point of time. 

• Leaked personal information. Sometimes people who are anonymous to the internet leak some 
information which can be used to directly link it to them such as phone numbers, same forum 
handles which is used by them when they are not anonymous, unique ids etc. 

• Metadata. As discussed in the last chapter there is so much hidden data in the files that we use 
and it might also be used to track down a person. 

• Hacking. There can be security holes in any IT product which can be abused to identify the real 
identity of the people using it. 

• Basic correlation. As shown in the first scenario, correlation can be used to pinpoint someone 
based on various factors such as timing, location, restricted usage, and other factors. 


Some of the suggestions/warnings for using Tor are listed at https://www.tor 
project.org/download/download-easy.html.en#\varning. These should be followed 
with every tool/technique discussed above, where applicable. Also use a separate 
browser for anonymous usage only and do not install addons/plugins which are not 
necessary. 

So we learned about various ways to stay anonymous online, but as stated earlier 
100% anonymity cannot be guaranteed online. What we can do is to try to leak as 
little information about ourselves as possible. The methods discussed in this chap¬ 
ter are some of the most popular and effective ways to do this. Online anonymity 
can have various use cases such as privacy, protest, accessing what is restricted by 
the authority, business related, law enforcement, journalism but it can also be used 
by people to perform illegal activities like malicious hacking, illegal online trade. 


Copyrighted material 





PA 169 


CHAPTER 


Deepweb: Exploring the 
Darkest Corners of the 
Internet 

INFORMATION IN THIS CHAPTER 

• Clearweb 

• Darkweb 

• Deepweb 

• Why to use it 

• Why not to use it 

• Deepweb: Tor, I2P, Freenet 



INTRODUCTION 

In this chapter we will start from where we left in the previous one. We learned about 
various tools and techniques related to how to stay anonymous online and also dis¬ 
cussed about some of the ways in which people still get caught. Here we will deal 
with the terms like darknet and deepweb and understand some of the fundamental 
differences. 

One of the most efficient ways discussed to stay anonymous was connecting 
to the anonymous networks like Tor and I2P. We will take this topic further and 
see what else we can do with it and how it relates to the topic of interest for this 
chapter. 

Until recent past terms like darknet and deepweb were not too popular. They were 
mostly a topic of interest for people who want to stay anonymous and related to IT 
(especially information security). Recently there has been some news stories related 
to these topics, which have made people interested in them and understanding what 
they are, how they operate, what to expect there, etc. We will cover all those things 
here and see if there is anything of interest for us. 

Before going any further with the technical details, let’s understand the basic 
definitions of the terms we will be dealing with in this chapter 

CLEARWEB 

We have already discussed in previous chapters about how the search engines work. 

Simply stated, it works by following the links on a web page and then on the next one 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8ni8fi7-5.ft0009-4 169 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






170 CHAPTER 9 Deepweb 


and so on. So the part of the web which can be accessed by a search engine is called 
clearweb. What this means is that anything that we get as a result of a search engine 
query is part of the clearweb. 


DARKWEB 

As a user we have clicked on different links on a webpage, but that is not the only 
way we interact with a website. Sometimes we have to submit some text to get the 
desired page (e.g., search box), sometimes we have to authenticate before accessing 
a specific page (e.g., social network website login), sometimes there are things like 
CAPTCHA which need to be entered before moving further. 

So apart from the web that is accessed by search engines there is still a huge 
amount of data that exists in pages not touched by web spiders/crawlers. This part of 
the web is known as darkweb or darknet. 

DEEPWEB 

Now we have a clear separation of the web into two parts, clearweb and darkweb, 
based upon their accessibility to the search engine. Now we will move a little 
deeper. 

The darkweb comprises of a huge part of the overall web. Inside this darkweb 
there exists another section which is called as deepweb. This deepweb is also not 
accessible to the search engines but it also cannot be accessed directly by standard 
browsers we daily use. This portion of the web is hidden deep inside the web and 
requires special applications and configurations to be accessed and hence is called 
deepweb. 

Now we have a clear understanding of what is darkweb and deepweb. We are 
well aware of how to access the regular darkweb and do it on a regular basis. Pages 
like social media profile which require login, search result page in a website, pages 
generated dynamically are some of the examples. However if we need to access the 
deepweb, we need to make special arrangements. Before getting into those details 
let’s understand a bit more about the deepweb. 

As stated earlier deepweb is a part of darkweb. Now the question arises that 
how come it exists inside darkweb but is still not directly accessible. The answer 
is that because it exists in the form of a network inside the internet, which in 
itself is a huge network, which means is that darkweb is created as a part of the 
internet but to access this specific network we need to have the right tools so that 
a connection could be made to it. Once we have the means to connect to it we 
can access it. 

In this fancy land of deepweb we can find all sorts of things like illegal drugs, 
weapons, art, and all sorts of black market things. On the other hand it is also used by 
people to speak freely, exchange ideas, etc. 


Copyrighted material 


PA171 


Darknet services 


WHY TO USE IT? 

If we are whistleblower, cyber investigator, cyber journalist, government intelligence 
agent, cyberspace researcher then this is the place for us. This will help us understand 
how the underground cyberspace works. It will give us ideas about the private 
days, targets, and attack pattern of cyber-crime, etc. It will help us predict the next 
attack pattern by understanding the underground community mind-set through the 
technology they use most frequently. 

It also provides freedom of speech, so if you want to protest for a good cause this 
is the place for you. For investigation of a cyber-crime this can be a popular place. 
As most of the underground community works here there is a chance of getting 
ample amount of proof from this place. This can be also used to keep track of online 
activities of a person or group. 

There are dedicated services for optimized use of deepweb such as secure file 
uploading facilities where activists or whistleblowers can anonymously upload 
documents. There are services related to people spreading a word that other should 
know, sharing what’s happening all around them, etc. There are online forums to 
discuss technology, politics, and much more; so if we have these kind of specific 
requirements or similar then we can use deepweb. 


WHY NOT TO USE IT? 

Apart from utilizing this space for ethical motives some people also use it to perform 
many illegal activities. There are many places in this area where we can find people 
selling drugs, fake ids, money laundering, hackers for hire, etc. Some websites even 
say that they provide assassins for hire. Apart from this it might also contain websites 
which provide many disturbing things. One must be very careful while accessing or 
downloading any content from such places at it might be illegal to access or have it 
on our computers. 


DARKNET SERVICES 

TOR 

One of the most popular portion of the deepweb is the *.onion domains. In the last 
chapter we learned about Tor, how it works and also how to use to stay anonymous. 
The same Tor also allows us to create and access one of the largest portions of the 
deepweb. We arc already aware about how to use Tor browser bundle to access the 
regular web, now that same tool can be used to access places which are not directly 
touched. 


171 


Copyrighted material 


172 CHAPTER 9 Deepweb 


Wc simply need to download the Tor browser bundle, extract it, and run the Tor 
browser. Once the connection to the Tor network is made we are good to go. Apart 
from accessing the regular websites Tor allows to create and access *.onion websites. 
These websites if tried to access through a regular browser without Tor configured, 
will simply display a “The Webpage is not available” message, some kind of error 
or redirect message; whereas will open up like a regular website through the Tor 
browser or a browser configured to access the internet through Tor as a proxy. 

Let’s start exploring these Tor-based domains. One of the most common 
places to start with is “The Hidden Wiki.” The address of this wiki is 
http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page. Notice that this URL 
does not contain a .com, .net, .org, or other familiar domain names, but is .onion. 
Firstly try to open this URL into a regular browser, docs it open up? Now open this 
URL into our Tor browser. We will get a webpage which contains a wiki page with 
a huge list of other .onion domains divided category wise. The categories listed 
are financial services, anonymity and security, whistleblowing, P2P file sharing, 
etc. We can explore this wiki further and check out some of the interesting links 
listed in it. 



FIGURE 9.1 

The Hidden Wiki. 


Similarly there is another wiki. Tor Wiki’ which lists a huge list of .onion domains. 
It also contains various categories in a neater way. This wiki makes it easier to explore 
the listed domains by marking them as verified, caution, or scam. 


Copyrighted material 































PA173 


Darknet services 



FIGURE 9.2 

TOR Wiki. 

The search engine DuckDuckGo that we discussed in a previous chapter, also 
has a .onion address, http://3g2upl4pq6kufc4m.onion/. Using this we can search the 
clearweb from a Tor domain. 



FIGURE 9.3 

DuckDuckGo Search Engine (.onion domain). 


173 


Copyrighted material 


































174 CHAPTER 9 Deepweb 


There are also some search engines such as TORCH http://xmh57jrzrnw6insl. 
onion/ available to search the deepweb, but they seldom work properly. 

As we can see in the wikis list there are various market places which sell illegal 
drugs. One of the most popular one was called as “Silk Road’” which was recently 
brought down by FBI, but a new one has come up to take its place and is called “Silk 
Road 2.0.” Similarly there are many other places which claim to have illegal items, 
as well as various forums, boards, internet relay chats (IRCs) and other places which 
provide like-minded people a platform to discuss and learn things. One such board is 
Torchan http://zw3crggtadila2sg.omon/imageboard/. There are various different topics 
such as programming, literature, privacy etc., on which people discuss their views. 



FIGURE 9.4 

TorChan. 


Till now we have seen how to access .onion domain websites, now let’s see how to 
create these. To create a .onion site first we need to have a local web server. XAMPP 
is one such option which uses Apache as a server. Once the server is installed and 
configured to host a local website, we need to modify the “torre” file. This file can 
be found at the location “Tor Browser\Data\Tor”. Open this file in an editor and add 
the following lines to it: 

HiddcnServiceDir C:\Tor\Tor_Browser\hid 
HiddenServicePort 80 127.0.0.1:80 

The path in front of “HiddcnServiceDir” is the path where Tor will create files to 
store information related to the hidden service we are creating. The part in front of 


Copyrighted material 


















176 CHAPTER 9 Deepweb 


Wc have seen how to create a Tor hidden service, but for it to be safe and 
anonymous we need to take various steps as followed: 

• Configure the server to not leak any information (e.g., Server Banner, error 
messages). 

• Do not run any service on that machine which might make it vulnerable to any 
attack, or might reveal the identity. 

• Check the security of the web application hosted. 

Tor also allows us to run hidden service through relays but it is not advised. 
Relays are nodes which take part in transferring the traffic of the tor network and 
act as routers for it. Relays are of different kinds: middle relays—which are start¬ 
ing and middle nodes in the packet transfer chain; exit relays—which are the final 
node in the chain and connect directly to the receiver; bridges—which are the 
relays that are not publicly listed as tor relays. Bridges are helpful when we are 
connecting to the internet through a monitored/managed network (e.g., college net¬ 
work) as it would make it difficult to identify if the user is connected to Tor using that 
network. These applications to run these services can be downloaded from the page 
https://www.torproject.org/download/download.html.en 


I2P 

Like Tor we also learned how to be anonymous using I2P. Now in this chapter we 
will not focus on the anonymity part again but will focus on how I2P will help us to 
access/create deepweb. 

Though wc will find number of places where wc will get lots of market 
places of hidden services related to I2P or can be accessible by I2P and in most 
places sites will claim the authenticity of the services provided, it’s better to 
cross check manually before using or accessing any of them to avoid unknown 
consequences. 

We already know how to install I2P, as we learned the same in the last chapter 
but for a quick reference we can easily download and install it from the following 
URL: https://geti2p.net/en/download (here we can get bundle for Windows, Mac, 
different Linux version, and also for android). Download the bundle according 
to your device and operating system and install. After installation once you open 
I2P, it will open in localhost (http://127.0.0.1:7657/home) or else as we learned 
in last chapter we need to manually type this web address in the address bars of 
the browser. After opening the same in browser once we get Network OK in left 
top corner of the page, configure the browser proxy settings to 127.0.0.1:4444 to 
access all the sites. And for IRC we can use localhost:6668 in our IRC client and 
can use #i2p for chat. After changing the browser proxy setting we will able to visit 
the eepsite sites with *.i2p extension. 


Copyrighted material 




PA177 


Darknet services 


£■; (2? Router Console • *>cme - Wndow, Internet Explorer „ m 



f a ) 


*1. fi' 

■Jf Favorites -jfc | Suggested Sites w Web Shce 0diary * 

<& OF Reuter Console • home 


’•I - 0 ' ’ <m - Page- Safety- Took- ” 


Shutdown 

LOCAL DESTINATIONS 


While you are wWmg. please adjust your bandwidth settings on die configuration page 

Also you can setup your browser Co use the I2P proxy to reach eepsitcs Just enter 127.0.0.1 (or localhostj p 
use SOCKS for this Mare information ran be found on the I7P browser proxy setup page 

Once you have a 'shared clients' destination listed on the left, please check out our FAQ 
Point your IRC diem to localhost.6668 and say hi to us on #i2p. 


rt 4404 as a http proxy into your browser settings. Do not 


WELCOME TO 12P 



i n=u 

• ::=aBBUa«aa a 

FEPSITES OF INTFREST 




o © 

51 

*> 

9 & » © it a 


E B Q r 9 fr o 

javadocs jisko i2p killyourtv i2p pastebtn planet I2p plugins postman's tracker protect w 


free web hosting htddengale i?p 

M 

stats I2p technical 


S E 


/ 

/ 





t.ar wiki 

ugha s wiki 





LOCAL SERVICES 

H 

Hi 

/ 

s 

Q K 

*> 

addrrssbook configure bandwidth 

conFsgure language 

customize home page 

email help router console 

torrents website 




C ' (El q 


0 Internet | Protected Mode: On 


FIGURE 9.7 

I2P home. 


Some of the sites are listed in the router homepage as shown in the figure. 

E.g., Anonymous git hosting: http://git.repo.i2p/ 

Here though we need to provide some details to push the respiratory, the identity 
is provided by us will not link to our real IP address, in this way we can use git 
anonymously. 



FIGURE 9.8 

I2P git. 

Free web hosting: http://open4you.i2p/ 


177 


Copyrighted material 

























































178 CHAPTER 9 Deepweb 


Here we can get details of how to use the free web hosting service. There are 
other details that can be found in the forum maintained in the following URL: 

http://open4you.i2p/index.php 

If we want to host any kind of website in the deepweb, this can be helpful. 
Pastebin: http://pastethis.i2p/ 

It is a website generally to save text online for a period of time for personal use. 
But popularly it is used as a source to provide credentials, latest cyber news, deface 
site details, cyber-attack target details, etc. Though in the normal pastebin we need to 
provide certain details to paste something, here no details required. 

We can also find all the paste details from the following URL: http:// 
pastethis.i2p/all/. 


|IL htt P : ’ pastethis.i2p all 



- & 

Favorites | ^ | Suggested Sites * Web Slice Gallery ▼ 




55 ” 0 I2P Router Console - home 0 All Pastes | Lodgeft! X ?; forum.i2p - Index 

0 id3nt.i2p 


{ 



pastethis.i2p 

All Pastes 

Paste #7787 . pasted on Oct 27, 2014, 2:44:32 PM 
cest test tset tset 111222333 

Paste #7786. pasted on Oct 27, 2014, 1:33:07 PM 

cjiaBa yxpaKKe 

Paste #7783. pasted on Oct 26, 2014, 9:19:48 PM 

arstarstrast 

Paste #7782 . pasted on Oct 26, 2014, 4:21:27 PM 

Hello, world. 

Paste #7781. pasted on Oct 26, 2014, 4:03:08 PM 

Ilojibsyacb cjiynaeM nepeaajo npuseT xoxJiymxaM 

Paste #7780 . pasted on Oct 25, 2014, 8:58:17 AM 


Paste #7779. pasted on Oct 25. 2014. 5:44:24 AM 

FIGURE 9.9 

I2P based Paste data service. 

Forum: http://forum.i2p/ 

It’s like a general forum to discuss different things in different section of threads. 
The topics may be related to I2P or something else. Depending upon the area of 
interest take membership, login and read, create or edit posts based on the permissions 
provided by the site. 


Copyrighted material 
























PA179 


Darknet services 




Mx 


Favorites ■jjg | Suggested Sites ▼ Web Slice Gallery w 





S£ ' $ I2P Router Console home *J forum.i2p - View Foru... > 0 id3nti2p 

i 



a ~ a' 








I2P:!:" 


Jump to 1 Select a forum 

V Go| 


FAQ Search Memberiist Usergroups Profile Log in to check your private messages 


Log in Register 


Topics 

Replies Author 

Views 

Last Post 


Discussion 






_A Announcement! Scammers on I2P - DO NOT PAY 
= ANYTHING! 

2 

echelon 

773 

Mon Jun 02. 2014 6:44 pm 
Guest 4Q 


pd*i Announcement; New B6CODE 

0 

carve ntes 

6710 

Wed Jun 30. 2004 2:35 pm 
Cervantes #0 


r Sticky; Post you I2P Messenger (QT) Oest keys here _ 

[ DC-oto page: 1. 2 ) 

CansorshipSucks 

6789 

Tue Mar 18. 2014 1:44 am 
Selma 4Q 


j Sticky: I2P Use Cese Survey 
» t DGoto page! 1. 2, 2 ] 

41 

Cervantes 

34210 

Thu Mar 06. 2014 6:02 pm 
Guest *0 


Jr Sticky: [ Poll ] Java Version 
™ [ D Goto page; 1-2 3 

16 

welterde 

9317 

Wed Jan 16. 2013 4i25 am 
Guest 4Q 


Eepsites that must upgrade by February 2014 

0 

HI 

43 

Thu Oct 23. 2014 7:29 pm 
zzz 4D 


howto make an anonymous chat? 

2 

fchristianf am 

24 

Thu Oct 23. 2014 12:05 pm 

hkoominabird ■*0 


Union Brotharhood fans of science 

1 

flues 

23 

Tue Oct 21. 2014 1:07 pm 
Guest NQ 


lookup in the netOb to find Bob's leaseSet 

3 

Guest_Guest 

42 

Fri Oct 17. 2014 3 1 S3 pm 
Guest 4Q 


Why are participates allowed to know If they are a 
gateway? 

2 

Guest 

48 

Thu Oct 16. 2014 9:07 pm 
Guest *0 


Does anyon have the math for sharing more then 
they take? 

0 

Guest 

32 

Sun Oct 12. 2014 2:08 am 
Guest 


Can i find a hacker here? 

1 

Soleke 

70 

Fri Oct 10. 2014 4:13 pm 
Guest <40 


Social Network I2P 

3 

tochans 

272 

Mon Sep 29. 2014 8:37 pm 
Guest 40 


Why NOT a Social Network like TORBOOK? 

3 

kakadepajato 

242 

Mon Sep 29, 2014 8:34 pm 
Guest *0 






Thu Sep 25. 2014 9:41 am 

FIGURE 9.10 






I2P based forum. 






Microblog: http://id3nt.i2p/ 





Id3nt is a microblogging site like twitter. Here 

we can post whatever we want, we 

can share our views, discuss on a particular topic, 

reply to some post of our interest. 

It’s quite similar to the normal microblogging site 





vj/ 0 idiot i2p 




■ Sing 

P 

lit Favorites 'je ■ Suggested Sites v fi Web Slice Gtlleiy v 





it * <5 CP Route* Console - home © id3n*.i2p * 


a - 0 

* ta # 

* Pege» Safety - Tools» ©* 

/ teach older lEs to render those elements ai 

all <hnp /,'dw*ertohtml5 org semantics htmHFunkflown-efemencs) 





oxdsnt 



I DENTISTS 

ABOUT RSS LOG IN B 




HOLY SHIT(LIST) 




Freshest Dents 






G> NightEyes M9 27/10/14 n r«pN '■» «*S6 | 0 v 


<* Accidents 




♦bgt».« Of course the regi4ar wit if boring 


< SUUHSMU 




Obigtex W*l 27/10/14 H>«pk toa»S I 0 v 


1020 users, 41II 
dents Total voces: 
6967/279- 




#f*ght£yes bn new to due dark web stuff, regular net is boring 


LATEST HASHTAGS 

” 





free net proxy 




€9 NightEyes 254 27/10/14 n .«* »4*m 11 

otes 

IDENTNEWS V 




4bigtex it is but needs more actively It can have alloc of potential 
people vied it 

O bigeex 2H7 27/10/14 | 0 


•f The avataro uploading 
bug has been fixed ot was 
a dependency issue; (U. 

didn't conpSt in zllb 
when n was instaatd). if 




just oot here, this is interest!no 


resolved, please send 
feedback 




Ol2r0n 2:22 27/10/14 | 0 v 

otesl*? 

e! For aS you ndtnthts 
that want an afternatwe to 




IWiv? for the glory of Evolution, of course. kS*g47jdwd3wdv*2.omon 


100* more choices' Try 




O sun.friis 

ores:*? 

theme from tz:o on 
your profile 





Other recent updates 



FIGURE 9.11 


Id3nt. 


179 


Copyrighted material 

































180 CHAPTER 9 Deepweb 


How to create own site using I2P: 

To create our own anonymous I2P web server we need to edit files from the follow¬ 
ing path. In case of windows machine the path is % APPDATA%\I2P\eepsite\docroot\ 
and in case of Linux machine the path is ~/.i2p/eepsite/docroot/. 


C:\Users\(o.o)\AppData\Roaming\I2P\eepsite\docrootj 


Organize ▼ Include in library ▼ Share with ▼ 

New folder 




'it Favorites 

Name 


Date modified 

Type 

Size 

E Desktop 

ik help 


10/27/201411:44... 

File folder 


i Downloads 

j| favicon 


2/16/2006 5:30 AM 

Icon 

2 KB 

Dropbox 

index 


10/28/2014 1:59 AM 

Firefox HTML Doc... 

1 KB 

% Recent Places 

Q robots 


2/16/2006 5:30 AM 

Text Document 

3 KB 


FIGURE 9.12 

Eepsite files. 


C:\U«rsMo.o)\AppOata\Roaming\I2l 


.html - Notepad* 


File Edit Search View Encoding Language Settings Macro Run Plugins Window ? 

oHu o «ar* mrn \ 9 c \ m ^\ * * £ gbsbbibibb 

[Jrvdex html q| 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 


l<html> 
l<! — 

# 

# If you have a 'split' directory installation, with configuration 

# files in ~/.i2p (Linux) or %APPDATA%\I2P (Windows), be sure to 

# edit the file in the configuration directory, NOT the install directory. 

# 

—> 

<head> 

<! — 

* Remove the following three lines to stop redirecting to the help page. 

* If it continues to redirect: 

* 1) Make sure you edited the correct file (see above) 

* 2) Clear your browser's cache. 

■ — H 

<title>I2P Anonymous Webse rver< / title> 

-</head> 

^<body> 

<p>This is a test site</p> 

</body> 

L </html> 


FIGURE 9.13 


Edit file. 


After completing all the edits we need to set up server configuration details from 
the following URL: http://l 27.0.0.1:7657/i2ptunncl/edit.jsp?tunnel=3 
The options are shown in below figure 


Copyrighted material 























PA181 


Darknet services 


ST *^^T«p^»jOJOa!^5^!2p<^o«redIt^^u^oel=^™ f* 

fa f front tt fa | Su^etted Srt« * f .\>h Slice Gi*er,' . 

• i • L>P Ano<v,moui Webimt.. ‘.i CP Tuiwel Mincer - E ... * *i ' □ ’ * W * p »9« ' bitty* locH* O’ 



FIGURE 9.14 

Edit server settings. 


By default the site created can be accessible locally from the following path 

http://127.0-0.1:7658. 


©O’ 

H http: 127.0.0.1 7653 


^ Favorites 

^ |l> Suggested Sites 

▼ gj Web Slice Gallery ▼ 

QQ 

QO 

▼ 

* • I2P Anonymous Webserve... 

* • I2P Tunnel Manager - Edit.. 


This is a test site 


FIGURE 9.15 

Local site. 


Though we can edit the same from the server settings, additionally we can use 
setup name, description, protocol, IP and port number, as well as the domain name 
from the above server edit page. There are certain advance options. But the options 
are quite straightforward so we can easily configure our web server and anyone can 
access the same using the provided domain name. 

Once completing all the configurations save the details. We will get the page 
where we need to start the web services we just configured shown in below figure. 


181 


Copyrighted material 




























































PA 182 


©CM 

H http: 1274)i)l:7^S7 r i2ptunne ;t?nonce- 73346642315557&JS34fitaction-startflctunnel-J 


llj ht xflHBmg 

•jit Favorites 

■ Suggested Sites * £] Web Slice Gallery » 



' I ? 12P Anonymous Webieive... '. i 12P Tunnel Manager - L-. *•$ I2P Anonymous Webserver 


0 ’Q 

/ ■" V- ..;■ . --™ " * " ' “ — I " "" .-""'T' " 



Reft.. h 




FIGURE 9.16 

Starting the service. 

Sometime we need to add the domain name and a long base64 key generated by 
the page in the router address book to access the site as shown in below image. 

GKJ’ M http: 127j0j01:7657/i2ptunnel/edit?tunnel=3 » Mftixiii 


Vf Favorite* | Suggested Sites w ^ Web Slice Gallery ▼ 


So * H I2P Anonymous Webserve. 

.. *•; I2P Tunnel Manager - E... X 

H I2P Anonymous Webserver 

Iri I2P Anonymous Webserver Of ~ ” 




| Edit server settings 








Name (tD 

| I2P Webserver test 



Type 

HTTP server 



Description fg) : 

|My eepsite testing 



Auto Stsrt(A). 

Q (Check the Bos lor -YES') 



target. Host(H): 

Port(£Use SSL? 


1127.0.0.1 

76S8j 


Website name(W): | 

|devi1s?one.i?p 

(leave blank for out proxies) 


Private key fllefk): 

| eepsite/eepPriv.dat 



Local destination(l): | 

jgO 0 6HW02 3 ELTAC4 ZObl- 1 mY , - 

-ckMbPAMFha-oOuaXLXyLaFCqzYOOvFZNDYS ^J| 


FIGURE 9.17 

Adding the name. 

Now we can access the page by the domain name. In my case as from the above 
figure it’s quite clear that the name is http://devilszonc.i2p/. 

Here is the figure showing the same using the domain name in the browser. 


Copyrighted material 




















































































PA183 


I H http devilszone.i2p 

* Favorites | ^ | Suggested Sites ▼ ^ Web Slice Gallery ▼ 

1381 * I2P Anonymous Webserve... I2P Tunnel Manager - Edit 


I2P Anonymous Webserver 1*5 I2P Anonymous Webse... 


c 



FIGURE 9.18 

Service running. 

Here we learned how to get different internal sites from the internet with *.i2p 
extension, how to access them using I2P, how to create our own I2P site for providing 
services. This will help us to understand the deepweb quite easily. 


FREENET 

Similar to Tor and I2P there is yet another anonymous network, freenet. It is one 
of the oldest networks around and is known for P2P file sharing capabilities. The 
applications can be downloaded from https://freenetproject.org/download.html. 
Once downloaded, simply install the application and run it. 

Freenet will open up a browser once it is run. The webpage displayed will provide us 
a series of choices to determine the security and data usage limit we desire to have and 
then perform the setup accordingly. Once this setup is complete, we will be presented 
with the freenet homepage. This page contains links to indexes of freenet websites (called 
freesites) similar to the Tor wikis and documentation related to other associated softwares 
and HOW TO guides. In the homepage there is also a search box which allows to search 
through freesites. Using certain plugins such as freetalk and freemail we can also use 
freenet to have communication over freenet. 


File Edit View Hntwy Boolcmsrics Took Help 

A 


Freenet. Freenet 
4 ? A loeelhott 1 " 

P, Most ViMted Getting Started 


- C (9 a - Gco?,r 


P4 '*130 - = 




My bookmarks lEditl 


Freenet 

Browsing Fiteshanng Fnends Discussion Status Configuration KevUtits 


Directories of websites on Freenet 

Enzo's Index (Links to most Freenet web sites, sorted by category to make it easier to find what you want) 

Lmkaoeddon (Links to every Freenet website, sorted by when they were last updated, mcludwig some very offensive sites Be careful what you click on!) 
Nerdaqeddon (Sendar to Linkageddon but with the most offensive content removed) 

Freenet related software and documentation 

Freenet Social Networking Guide (Step by step guide to how to set up anonymous email, forums, chat, social tools etc on Freenet Strongly recommended!) 

Freemail (Email over Freenet) 

Publisht (How to publish web sites on freenet) 

Freesite HOWTO (Another guide to publishing a website to Freenet) 

Sone (Social chat over Freenet) 

iSite (Essential tool for uploading websites to Freenet) 

Freenet Documentation Wiki (Freenet documentation wiki) 

Freenet team’s blogs 

Toad (Chief Freenet developer's personal blog) 

_English Switch to advanced mode Securitylevete LOW LOW f *1 10 —^ 


FIGURE 9.19 

Freenet homepage. 


Copyrighted material 








































PA 184 


184 CHAPTER 9 Deepweb 


Enzo’s index is one such index which lists many freesites and has divided them 
under categories. Another such list is Linkageddon. 


File Edit View Hijtoiy gookmarlcj Tools Help f 
Enio'i Index 


C ® 0 Google 


& localhost 
£ Mott Visited LJ Getting Started 

Schneier on Security: The US Intelligence Community has a lhi... 

Pages: 1 - Links: 0 - Updated: 2014-08-08 New 

a Politics 


P2P Papers 

A colection of technical P2P documents 

Pages: 1 - Links: 0 - Updated: 2014-08-07 New 
ill - Infos 


New leaker disclosing US secrets, government concludes - CNN.com 

The federal government has concluded there's a new leaker exposmg national 
secunty documents in the aftermath of surveillance disclosures by former NSA 
contractor Edward Snowden, U.S. officials tel CNN. 

Pages: 1 ■ Links: 0 - Updated: 2014-08-07 New 
- Politics 


Index - junkfood palace 
randomness at its best! ( 

Pages: 36 - Links: 1 - Updated: 2014-08-06 ^ 23.33 % (7) 


Download one web page from a website with all its prerequisite 

*s: 1 • Links: 0 • Updated: 2014-08-04 New 


P* *☆£**!>• -foO - = 

1 logs - Adult (9) * 

I logs - fHS flavour (400) 

Freenet - Dev (38) 

Freenet - Filesystem (2) 

Freenet - FileTransfer (13) 

Freenet • Help (35) 

Freenet- Indexes(36) 

Freenet - Messaging (37) 
rreenet - Other (12) 

Freenet - Publication (23) 

Freenet - Search indexes (2) 

Freenet - Spiders (6) 

Freenet - Stats (20) 

Galleries (38) 

Games (1) 

Humour (24) 

Infos (191) 

LGBT (6) 

Mirrors (36) 

Movies/Video (62) 

Music (55) 


FIGURE 9.20 

Freenet Enzo’s Index. 



FIGURE 9.21 

Freenet Linkageddon. 


Copyrighted material 



















































































PA 185 


Disclaimer 


Freenct also allows us to connect with people whom we already know and are 
using freenet under the URL http://localhost:8888/friends/. For this we simply 
need to exchange a file called as noderefs with them and provide this file on the 
mentioned page and simply click on the add button at the bottom. Under the URL 
http://localhost:8888/downloads/ we can perform file sharing operations. Similar to 
other networks discussed, freenet also allows to create and share our websites in 
their network. Freenet maintains its own wiki https://wiki.freenetproject.org which 
lists information related to different features of it and about how to perform different 
operations including freesites setup. 

Apart from these mentioned networks there are also some other networks which 
provide similar functionalities, but Tor, I2P, and freenet are the most popular ones. 

In this chapter we moved on from exploring the regular internet and learned about 
some less explored regions of it. We discussed in detail about the deepweb, how to 
access it, how to create it, and what to expect there. We also learned about its uses 
and also how it is misused. We have also shared some associated resources which 
would help to explore them further, but be warned you never know what you might 
find there so act with your own discretion. 

Till now we have learned about various tools, techniques, and sources of 
information which might help us to utilize the internet in a better and efficient 
way. Moving ahead we will learn about some tools and their utility in managing, 
visualizing, and analyzing all the collected data so that we can better understand and 
utilize the raw data to get actionable intelligence. 


DISCLAIMER 

The part of the internet that will be discussed in this chapter might also contain 
illegal and/or disturbing things. Readers are advised to use their discretion and act 
accordingly. 


185 


Copyrighted material 





PA 186 


This page intentionally left blank 


Copyrighted material 



PA 187 



Data Management and 
Visualization 


10 


INFORMATION IN THIS CHAPTER 

• Data 

• Information 

• Intelligence 

• Data management 

• Data visualization 


INTRODUCTION 


Till now wc have learned about gathering data using different methods. Generally 
people think that open source intelligence OSINT means collecting data from differ¬ 
ent internet-based open source options. But it’s not limited to that because if the data 
we collected from different sources are not categorized properly or we cannot find 
relations between one another it can be just a huge amount of random data that is of 
no use. We will discuss later the need of managing data and analyzing its worth, but 
for the time being let’s refresh what we learned so far and how to collect different 
data using different sources. 

From the very beginning we have focused on data extraction using different 
methods. Wc started with search engines, where generally a normal user gets 
answers for all the questions, and we also discussed how that is just a minute part 
of the web as popular conventional search engines have only a limited amount of 
internet indexed in their databases. So we learned how to use other specific search 
engines to get specific data. Some popular features of mainstream search engines 
that make them unique as compared with other. Further we learned about some 
interesting tools and techniques to find out data which is openly available. Later 
we moved on to power searching and learned how to get desired information from 
the web effectively. Then we moved to metadata and how it can be helpful. We 
learned how to get all the metadata information and how can we use it for different 
purposes and last but not the least we covered Deep Web, the part of web which is 
not directly indexed by conventional search engines. Wc learned how to access it 
to get more information. 

So for the time being we can say that we learned how to collect data from dif¬ 
ferent sources directly using some well-known solutions and also using some 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.ft0010-0 187 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






188 CHAPTER 10 Data Management and Visualization 


unconventional tools that open even more doors to collect data. Before going any 
further, let’s discuss a bit about what is data, information, and intelligence and how 
they differ from one another. 

DATA 

“Data” is one of the most commonly used terms in any domain, especially IT. If we 
describe data in simple words it means the RAW form of entities. It is the depiction 
of facts in a basic form. For example, if we get a text file that consists of some kind 
of names abc.inc, xyz.com, john, 28, info@xyz.com, CTO, etc. We can see there are 
certain entities we found but have no meaning. This is the raw form. In its original 
form, data do not have much worth. 


INFORMATION 

The systematic form of data is called information. When data is categorized based 
on the characteristics it can be called as information. We can say that aggregated 
and organized form of data is information. Hence to achieve information we need to 
process data. Let’s take the same example abc.inc is a company name, xyz.com is a 
domain, john is a username, 28 is age, info@xyz.com is an email address registered 
with xyz.com, and CTO is a position. 

INTELLIGENCE 

When we relate different information based on their relations with one another 
and derive a meaning out of that, then what we get is called intelligence. So we 
analyze and interpret the information at hand according to the context to achieve 
Intelligence. From same example we can derive that xyz.com and info@xyz.com 
belong to same domain. It is quite possible as john is 28 year old and is the CTO 
of abc.inc. These are primary predictions they may be false positives also, so we 
need to validate the same later but for the time being the information that we have 
looks like relative so we can conclude that. John who is 28 years old is the CTO 
of abc.inc and the domain name of the same company is xyz.com and email id to 
communicate is info@xyz.com. 

To validate we will need to extract information from other sources and we might 
get to know that the name of the CTO of abc.inc is someone named John and there 
is a John who works at abc.inc whose email is info@xyz.com and similar informa¬ 
tion which might correlate to prove our theory right or wrong. Now let’s say we are 
a salesperson and our job is to contact management of different companies, then the 
validation of this information being right allows us to contact John and craft an email 
depending upon his age group and other information about him that we might have 
extracted. 

The definition of intelligence may differ from different people. This is our own 
definition based on our experience, but the bottom line is it’s about the usefulness of 


Copyrighted material 




Introduction 189 


the information. Unlike data, which states raw facts, actionable Intelligence allows 
us to take informed decisions. 

As we discussed earlier also, data is the raw form which just contains the entities. 
Entity means anything tangible or intangible. It may be name, place, character, or 
anything. If it is just a data it is worthless for us. We do not know what it is about. 
We can get lots of random data but for using that we must understand what that data 
is all about. Let’s again take another example, we got 1000 random strings, but what 
to do with that. But if we come to know that those are some usernames or passwords 
then that 1000 random strings are worth a lot. We can use that as dictionary attack 
or brute force etc. 

It’s not the data that is always valuable, it’s the information about the data or the 
information itself that is worth a lot. 

Managing data is very important. Managed data can be quickly used to find rela¬ 
tionships. Let’s say we have got lots of data and we know that the data consists of 
name, email id, mobile number etc. If we will not manage that systematically in rows 
and columns, we will lose track of that and later when we need a particular set of 
data, let’s say name, it will be difficult for us to differentiate and fetch from large 
amount of unmanaged data. 

So it’s always important to manage the data by its types in a categorized manner 
so that later we can use the same quite easily. As seen in previous chapters there are 
various sources of information. Every source is unique in its own way. When all the 
data from different sources comes together it creates the complete picture and allows 
us to look at it from a wider perspective. 

As we covered Maltego earlier, we have seen how it collects data from differ¬ 
ent sources, but even then there arc many other sources. The thing to focus here 
is that it’s not about running a tool and collecting all the data. It’s about running 
transformations one by one to get desired final result. To extract data from dif¬ 
ferent sources, correlate it and interpret it according to our needs, it’s not pos¬ 
sible in most cases that we will be able to get all the data we want from a single 
source. So we need to collect different data from multiple sources to complete 
the picture. 

For example, let’s take a condition that we need to collect all the data about a 
person called John. How to do that? As John is quite common name, it is very dif¬ 
ficult to get all the information. Let’s start with some primary information. If we 
can identify the picture of John then we might start with a simple Google search 
to check the images, we might or might not get his picture, if we get the picture 
visit the page from where Google fetched this picture to get to know more about 
John but if not then simply try social networking sites like Facebook or Linkedln, 
there is a chance that we can get the picture as well as the profile of John in one of 
the social network sites or all. If we get the profile then we can get more further 
information like his email id, company name, position, social status, current city, 
permanent residence. 

After getting those details we can use the email id to check what other places 
it is used, such as any other sites, blogs, forums etc. There are different online 


Copyrighted material 


PA 193 


Data management and analysis tools 


Process 





Flow Line 

-► 


FIGURE 10.3 

Some commonly used flowchart symbols. 


We discussed a bit about the methods which are usually used for data storage and/ 
or management. Now let’s move on to learn about something different than the usual 
stuff and see what other great options are available out there which can help us with 
our data management and analysis needs. 


193 


MALTEGO 

Any open source assessment is not complete without the use of Maltego. It’s an integral 
part of OSINT. We already discussed a lot about this tool earlier and discussed how to 


Copyrighted material 







Data management and analysis tools 197 


MagicTree simply open it, add some network address or host address to the scope so 
that MagicTree will able to build a data tree for the same. The advantage of storing 
data in tree form is that if later we want to add some other data it will not affect the 
tree, we just need to create a new tree. It stores the data in tabular or list form and 
uses XPath expression to extract data. There are many report templates that can be 
customized and used for report generation. 

The only limiting feature of this tool is that it only supports import option for xml. 
So we cannot add tools which generate text output. Although it is limitation but still 
this tool is pretty helpful for workflow automation for data retrieval from any tool, 
and also highly recommended for pentesters. 



FIGURE 10.7 

MagicTree interface. 


KeepNote 

As the name suggests KeepNote is a note taking application. It is a cross-platform 
application which can be downloaded from http://keepnote.org/. Unlike traditional 
tools for note making such as Notepad, KeepNote contains of various features which 
make note taking more efficient and allows to include multiple media into it. 

To start note taking using KeepNote we need to first create a new notebook from 
the File option. Once a notebook has been created we can add new pages and sub- 
pages into the notebook. Now in these pages we can keep our notes and keep them 
categorized. We can simply write the text into the bottom right part of the interface. 


Copyrighted material 












































































200 CHAPTER 10 Data Management and Visualization 


image, marker, summary, attachment, audio notes etc. The variety of data types 
allowed by Xmind makes it very easy and effective to create a mind map which 
can actually translate our ideas into a visual representation. We can create mind 
maps to create diagrams for project management, planning, decision making etc. 



FIGURE 10.10 

Xmind sample template. 


Though the free version of Xmind has some limitations as compared to the pro ver¬ 
sion, yet it provides ample ways to visualize our ideas in a creative and effective manner. 
There are various models and methodologies which are used in different domains 

for data analysis process. Some are generic and some only fit to certain industries. 

Here we are giving a basic approach which applies generically and can be modified 

according to the specific needs: 

• Objective: Decide what is the question that needs to be answered. 

• Identify sources: Identify and list down the sources which can provide data 
related to our objective. 

• Collection: Collect data using different methods from all the possible sources. 

• Cleaning: From the data collected, anything that is irrelevant needs to be 
removed and the gaps present need to be filled. 

• Data organization: The cleaned data needs to be organized into a manner which 
allows easy and fast access. 

• Data modeling: Performing the modeling using different techniques such as 
visualization, statistical analysis, and other data analysis methods. 

• Putting in context: After we have performed the analysis of data we need to 
interpret it according to the context and then take decision based on it. 


Copyrighted material 
































































Data management and analysis tools 201 


Unlike other chapters where we focused on data gathering, here we focused 
around data management and visualization. Collecting data is important but manag¬ 
ing it and representing it into a form which makes the process of analysis easy is 
quite important. As we learned earlier in this chapter that raw data is not of much 
use, and we need to organize it and analyze it to convert it into a form which is 
actionable, the tools mentioned in this chapter assist in that process. Once we have 
analyzed the data and put it in context, we will achieve intelligence which helps us 
to take decisions. 

Moving forward in the next chapter we will discuss about online security. Day 
by day cyberspace is becoming more insecure. New malwares keep on surfacing 
every now and then, attack techniques are advancing, scammers are developing new 
techniques to trick people etc. With all this around there is so much that we need to 
safeguard ourselves from. We will discuss about tools and techniques to shrink this 
gap in our security and will learn how to minimize our risk. 


Copyrighted material 


PA203 


Online Security 


CHAPTER 



INFORMATION IN THIS CHAPTER 

• Security 

• Online security 

• Common online threats 

• Identify threats 

• Safety precautions 


INTRODUCTION 

In the previous chapters we have fiddled a lot with the internet. We have used a vari¬ 
ety of tools to access it in different forms, we have also been to some of the lesser 
touched areas of it and also learned about how to stay anonymous while doing so. We 
also learned about some tools which would help us during the analysis of all the data 
we have collected from this massive information source. In this chapter we are going 
to touch upon a topic which has become very relevant in today’s digital age and it is 
online security. Internet is a great place where we can learn different things, share it 
with others and much more. Today internet is available worldwide, and accessing it 
is pretty easy. We have devices which allow us to stay connected even on the move. 

Today we rely on the internet for our many needs such as shopping, paying our 
bills, registering for an event, or simply stay social. Even our businesses rely on the 
internet for their day to day operations. We as a user of internet use different plat¬ 
forms, click on various buttons, visit various links on a daily basis. For an average 
user, it may seem pretty simple but involves huge amount of technical implementa¬ 
tion at the backend. 

Similar to our physical world this virtual world also has security issues, it’s no 
big news that daily people are becoming victim of cyber-crimes. There are a variety 
of threats which we face in this digital world and sometimes we don’t even recognize 
them. For example, we all get that spam message stating that we have won a huge 
amount of money in some lottery and we need to share some details to receive it. 
Although most of us simply ignore such messages, some people do respond and are 
victimized. Similarly we have already seen how the information we share online might 
reveal something about us that we don’t intend to. This kind of information in wrong 
hands can be used against us. Recently there have been many cases which involved 
hackers attacking an employee’s machine to gain access to the corporate data. The 


Hacking Web Intelligence. bttp://dx.doi.org/l().1016/B978-0-12-8018fi7-5.ft0011-2 203 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






204 CHAPTER 11 Online Security 


main reason behind the success of such attacks is the lack of security awareness among 
the users. Though understanding the technical aspects of cyber security can become 
a bit complex for a nontechnical person but understanding how some of the common 
attacks work, learning how to identify them, and finally how to mitigate them is a 
necessity for every user. One question that is mostly asked by people is that why would 
someone try to hack us though we don’t have any sensitive/financial information on 
our computers. Instead of answering this question right away let’s learn about some 
of the attack methods and then discuss why someone would attack an average user. 

We are in a world where we love to spend more time online than in social. The 
reason may be anything starting from shopping, e-mails, social online hangout, chats, 
messages, or online recharge or banking. We may use internet for our professional 
use such as it may be part of our day to day job or for personal use to browse or surf. 
Anyways the motive behind discussing is that internet is now an integral part of life, 
and it’s quite difficult to avoid it. 

Earlier we came across about only one aspect that is internet privacy: how to 
maintain privacy using different online solutions, browser setting, or anonymity 
applications. What about the other aspect that we missed. That is security. So we 
need to also put some light on the security aspects of the internet. 

When we say security aspect it’s not just about using secure applications or visit¬ 
ing secure sites, or having security implementation on our system such as updated 
antivirus and firewall. In this case security also means about data security or to be 
precise internet data security. 

Securing not only the data of an organization will make it secure but also the users’ 
data that’s also quite important. So in case of data security we need to focus on both 
organizational data as well as users’ data. For example, let’s say an organization imple¬ 
ments proper security mechanism to secure its data. All kinds of security softwares 
starting from antivirus, firewall, intrusion detection system, intrusion prevention sys¬ 
tem, and all other security tools implemented or installed in a system but if the security 
question of the HR’s (Human Resource) e-mail id is what is your favorite color and 
the answer is pink then all these security implementations will go vain. So it’s both the 
users’ data as well as organizational data that are important and we need to take care 
of both. 

As an organization we discussed a little bit on how the metadata can disclose 
certain information and how DLP (data leakage/loss prevention) can be helpful to 
secure all those. But from a user perspective it is also quite important not to share 
details in public that can be used against us or our security, for simple example, do 
not disclose information in public that can be used to know more about our way 
of thinking or about our certain area of interests. Let’s say we disclose informa¬ 
tion that can be used to recover our password such as the answers related to any 
security questions, e.g., who is our favorite teacher, what place we like the most, 
what is mother’s maiden name etc. These are common security questions which we 
generally find in different online applications that can be used as an additional veri¬ 
fication to recover passwords. If we will disclose these information in public, let’s 
say in any social networking site, blog, or anywhere else, that can be a threat to 


Copyrighted material 


206 CHAPTER 11 Online Security 


come from online where we try to access certain restricted sites such as adult sites, free 
music, or software hosting sites etc. So as a user, verify the source before download¬ 
ing anything. There arc various classifications of malware, some of which are defined 
below. 


VIRUS 

Virus or Vital Information Resources Under Seize is a term taken from the normal 
virus that affects person and can be the reason for different diseases. Similarly the 
computer virus is a malicious code that when executes in a system, infects the sys¬ 
tem and performs malicious activity like deleting data, memory corruption, adding 
random temporal data etc. The only weakness of the virus is, it needs a trigger for 
execution. If our system contains a malicious software that is affected by virus until 
and unless we install that in our system there is nothing to fear. To avoid a virus infec¬ 
tion use genuine updated antivirus. 

TROJAN 

Trojan is quite interesting malware, it generally comes as a gift such as if we visit 
restricted sites then we will get some advertisements such as we won an iPhone, 
click here to apply and all, or in popular paid games as free, then once user is lured 
to that and installs that after downloading then the application will create a backdoor 
and provide all user actions to the attacker. So to spread a Trojan, if the attacker will 
choose a popular demanding paid app, game, movie or song then the chances of get¬ 
ting more people are quite a lot. 

Trojans are nonself-replicating but hide behind another program. It is recom¬ 
mended that do not install any paid thing that comes as free. You never know what is 
hidden inside that application and also use antimalware in system for better safety. 

RANSOMWARE 

As the name suggests, it is quite interesting malware which after infecting the sys¬ 
tem blocks some popular and important resources of our computer system and then 
demand ransom money to give back the access. Usually ransomwares use encryption 
technologies to hold our data as captive. The recommendation will be the same as 
mentioned above. 

KEYLOGGER 

Keylogger is a piece of malware that collects all the keystrokes and sends the same 
to the attacker. So when user inserts any credential for any site, the credential can be 
recorded and sent back to the attacker and that can be later used by the attacker for 
account takeover. The recommendation for this is if you are typing credentials for 
any transaction-related site or value-related to any critical information, always use 
on-screen keyboard. 


Copyrighted material 


PA207 


Online scams and frauds 


PHISHING 

It is one of the oldest and still popular attacks which are also used in many corporate 
attacks. It is a simple attack where attacker tricks the user by sending a fake link that 
contains a page that looks quite similar to the original site page that user needs to 
log in. Once user will login in that site the credentials will be sent to the attacker and 
user can be redirected to the genuine site. The major weakness in this attack is the site 
address. If a user will verify the site address properly then there is very less chance of 
getting a victim of phishing attack. 

The information needed here is which site the target is having account on and 
which site the target generally visits quite often. So that later attacker can create a 
fake page of that and trick the user. 

There are many new ways of phishing attack techniques available now. Some 
are desktop phishing where the host entry of the victim’s system will be changed 
such as it will add an entry on the host file with the sites’ original domain name with 
the address where the fake page is installed. So when a user types the IP address or 
domain name in the browser it will search for host entry. The host entry will redirect 
and call the fake page server, and the fake page will be loaded instead of the real page. 

Another such popular phishing attack is tabnabbing. In tabnabbing when user 
opens a new tab the original page will be changed into fake page by URL redirection. 
There are also other popular phishing attacks such as spear phishing. 


ONLINE SCAMS AND FRAUDS 

One of the most widely faced issues online is the spam mails and scams. Most email 
user receives such mails on a daily basis. These mails usually attempt to trick users 
into sending their personal information and ultimately skim their money. Sometimes 
it is a huge lottery prize that we have won, or a relative in some foreign country who 
left us huge amount of money. 

Maxwell Tobo Nov 24 at 10:42 PM 


Beloved Friend, 

I am writing this mail to you with heavy tears In my eyes and great sorrow in my heart because my Doctor 
told me that I will die in three months time. Base on this development I want to will my money which is 
deposited in a security company. I am in search of a reliable person who will use the Money to build charity 
organization for the saints and the person will take 20% of the total sum. While 80% of the money will go to 
charity organization and helping the orphanage. I grew up as an Orphan and i don't have anybody/family 
member after the missing of my adopted son with Malaysia Airlines Flight MH370. Meanwhile at this point I 
do not have anyone to take care of my wealth The total money in question is $7 5million dollars I will 
provide you with other information’s once you indicate your willingness 

Please contact me on my personal email on: maxtobo555@gmail.com 


Yours sincerely, 
maxwell tobo 

FIGURE 11.1 

A sample spam mail. 


207 


Copyrighted material 


208 CHAPTER 11 Online Security 


Scammers also try to exploit human nature of kindness by writing stories that 
someone is stuck on a foreign land and needs our help and other such incidents. 
Sometimes attackers also pose as an authority figure asking for some critical infor¬ 
mation or as the e-mail service provider asking to reset the password. There are 
various Ponzi schemes which are used by scammers with ultimate purpose of taking 
away our hard earned cash. 


HACKING ATTEMPTS 

There are cases where we found that users with updated operating systems, antivirus, 
and firewall also face some issues and being victim of the hacking attack. The reason 
of those is certain popular application (laws that can be found in any operating sys¬ 
tem. Some such applications are Adobe Acrobat Reader or simply the web browsers. 
These kind of applications are targeted widely which covers almost all the operating 
systems and also widely used. So targeting these applications allows an attacker to 
hack as many as users possible. They cither create browser plugins or addons that can 
help user to complete a process or to automate a process and the same in the backend 
can be used for malicious intention,i.e., collecting all the user’s actions performed 
in the browser. 


WEAK PASSWORD 

Weak passwords always play a major role in any hack. For the ease of user, sometime 
applications do not enforce password complexity and as a result of that users use 
simple passwords such as password, password 123, Password® 123, 12345, god, own 
mobile number etc. Weak password does not always mean length and the characters 
used, it also means the guessability. Name® 12345, it looks quite complex password 
but can be guessable. So do not use password related to name, place, or mobile num¬ 
ber. Weak passwords can be guessable or attacker can bruteforcc if the length of the 
password is very small, so try to use random strings with special characters. Though 
that can be hard to remember as a security point of view it’s quite secure. 

Strong password is also needed to be stored properly. Let’s say, for example, 1 
created a huge metal safe to store all my valuable things and put the key just on top of 
that. It won’t provide security. It’s not just about the safe but also about the security 
of the key. Similarly creating a very complex password won’t serve the purpose if we 
write it and paste it on our desk which also should be kept safe. 


SHOULDER SURFING 

Shoulder surfing is always a challenge with a known attacker, a person whom you 
know and you work with. If he/she wants to hack your account then it is quite easy to 
do it while you are typing the password. The only way to make it difficult is that type 


Copyrighted material 


PA209 


Antivirus 


some correct password characters then write some wrong characters then remove 
the wrong characters and complete the password or else do not enter your password 
when someone around. 


SOCIAL ENGINEERING 

The first thing comes to our mind when we read social engineering is “there is no 
patch for human stupidity” or human is the weakest link in the security chain. This is 
a kind of attack which is done against the trust of the user. In this attack, first attacker 
wins the trust of the victim then collects all the information that is needed to execute 
one of the attacks we discussed above or any other attack. The only way to prevent 
from being a victim is trust no one, you never know when your boyfriend/girlfriend 
will hack your account. Jokes apart, do not disclose any information that has a pos¬ 
sible significance with security to anyone. 

So these were some of the security-related challenges that we face everyday, but 
we have only covered the problems. Let’s move on to see what are the solutions. 


ANTIVIRUS 

As we discussed that there are various kinds of malwares out there and each one has 
unique attack method and goal. There is a huge variety of these and most of the com¬ 
puter users have faced this problem at least some point of time. 

Antiviruses are one of the security products which are widely used by organi¬ 
zations as well as individuals. An antivirus is basically a software package which 
detects malwares present on our computer machines and tries to disinfect it. What 
antiviruses have is a signature and heuristics for malwares, and based upon these 
they identify the malicious code which could cause any digital harm. As the new 
malwares arc identified, new signatures and heuristics are created and updated into 
the software to maintain the security from the new threat. 

Many antiviruses have been infamous for slowing down the system and making 
it difficult to use, also the frequent updates have also annoyed people a lot. Recently 
antiviruses have also evolved and become less annoying and more efficient. Many 
solutions also provide additional features such as spam control and other online secu¬ 
rity solutions along with antivirus. The regular update is not just for the features but 
also to keep the database updated to maintain security. There are various choices 
in the market for antivirus solutions free as well as commercial, but it all comes 
down to which one is the most updated one because new malwares keep on surfacing 
everyday. One more thing that needs to be kept in mind is that there are also some 
malwares posing as antivirus solutions and hence we need to be very careful when 
making a choice for an antivirus solution and should download only from trusted 
sources. 


209 


Copyrighted material 


210 CHAPTER 11 Online Security 


IDENTIFY PHISHING/SCAMS 

We encounter a huge number of scam and phishing mails on daily basis. Today e-mail 

services have evolved to automatically identify these and put them in the spam sec¬ 
tion, but still some of these manage to bypass. Here are some tips to identify these 

online frauds: 

• Poor language and grammar: Usually the body of such mails is written in poor 
language and incorrect grammar. 

• Incredibly long URL and strange domain: The URLs mentioned in such e-mails 
or the URLs of the phishing page can be checked by simply hovering the mouse 
over the link. Usually such URLs are very long and the actual domains are 
strange. This is used to hide the original domain and show the domain of the 
page that is being phished in the browser address bar. 

• Poor arrangement of the page: The arrangement of the text and images is gener¬ 
ally poor as many attackers use tools to create such e-mails, also sometimes the 
alignment changes because of the change in resolution. 

• E-mail address: The original email should be checked to verify the sender. 

• Missing HTTPS: If the page is usually an HTTPS one and is missing this time 
then this is an alarming sign. 

• Request for personal/sensitive information: Usually no organization asks for 
personal or sensitive information over e-mail. In case such email is received it is 
better to verify by calling the organization before sending any such information. 

• Suspicious attachments: Sometimes these kinds of e-mails also contain an attach¬ 
ment file in the name of form or document usually with strange extensions such as 
xyz.doc.exe to hide the original file type. Unless trusted and verified, these attach¬ 
ments should not be opened. In case the attachment needs to be opened it should 
be done in a controlled environment such as a virtual machine with no connection. 


UPDATE OPERATING SYSTEM AND OTHER APPLICATIONS 

One of the major methods used by attackers to gain access to our machines is to hack 
through the applications running on the system. The operating system we use or the 
applications running over it contain flaws in the form of vulnerabilities. Attackers 
use exploit codes to attack specific vulnerabilities and get a connection to computer 
systems. New vulnerabilities are discovered on regular basis and hence the risk keeps 
on increasing. On the other hand patches for these vulnerabilities are also released 
by the vendors. Keeping our machine’s softwares updated is an effective method to 
minimize our risk of being attacked. 

Almost all operating systems come with mechanisms which allow it to update 
with the recent patches available. They also allow us to manually check for updates 
and install if available. Apart from this other applications that we use such as mul¬ 
timedia players, document readers etc., also have patches and some of them arc 
updated automatically while some need to be downloaded separately and installed. 


Copyrighted material 


PA211 


Addons for security 


Secunia PSI is a Windows-based application which helps us to identify outdated 
software and is also capable of automating the process of updating it. It can simply 
run in the background and identify the applications that need to be updated. User can 
download the appropriate patch and also install it. In case it is unable to do so it noti¬ 
fies the user and also provides useful instructions. 


ADDONS FOR SECURITY 

Web browsers are one of the most widely used applications on any platform and also 
the medium for most of the attacks. Let’s learn about some easy-to-use addons which 
can help us to stay secure online. 


WEB OF TRUST (WOT) 

WOT is a service which reviews website reputation based upon crowdsourced 
method. Based on the review of the crowd the addon allows us to know how it is 
rated on the scale of trustworthiness and child safety. Similarly users can also rate a 
website and hence contribute to make the web a safer place. Details and comments 
about the website you are visiting can also be viewed which help users to make an 
informed decision. Using the applications is pretty simple, visit the website and click 
on the WOT addon in the browser bar and it will display the details related to it. The 
addon is available at https://www.mywot.com/en/download for different browsers. 



FIGURE 11.2 

Web of trust (WOT) in action. 


211 


Copyrighted material 























PA213 


Password policy 



FIGURE 11.3 

Microsoft Baseline Security Analyzer scan result. 

Similarly there is Linux Basic Security Audit (LBSA). This is a script which aims 
at making the Linux-based systems more safe and secure, though the setting should 
be applied depending upon the requirements and might not be suitable for all sce¬ 
narios. More details about it can be found at http://wiki.metawerx.net/wiki/LBSA. 

Using such free and easy to use utilities we can certainly identify the gaps in our 
security and take appropriate steps to patch them. 


PASSWORD POLICY 

As we use keys to maintain the authentication in real world similarly we use pass¬ 
words in the digital world. Passwords are combinations of characters from different 
sets of alphabets, digits, special characters which we provide to access and prove that 
we are the rightful owner of the specific data/service. Using passwords we access our 
computers, our social profiles, and even bank accounts. Though passwords are of such 
relevance most of us choose to have a weak password. The reason behind it is that as 
humans we have a tendency to choose things which are easy to remember. Attacker 
exploits this human weakness and try to access our valuable information through dif¬ 
ferent techniques. Without going into the technical details of such attacks some of them 


213 


Copyrighted material 

























PA215 


PRECAUTIONS AGAINST SOCIAL ENGINEERING 

One of the main techniques used by hackers to extract sensitive information from the vic¬ 
tims is social engineering. We as humans are naturally inclined to help others, answer to 
authority, and reciprocate. Using these and other similar weaknesses (in context of secu¬ 
rity) of human nature, we are exploited by attacker to make us reveal something sensitive 
or take an action which might not be in our favor. People simply pose as the tech, guy and 
ask for the current password or tell that they are the CTO of the company speaking and 
ask the receptionist to forward some details. To safeguard against such attacks, security 
awareness is very important. People need to understand what information is sensitive in 
nature. For example, it might seem that there is no harm in telling someone the browser 
version we are using at the enteqorise but this information is very crucial for an attacker. 
Also one may trust but must verify. People should ask for proof of identity and also cross- 
verify it to check if the person is actually who he/she is saying he/she is. In case of doubt it 
is better to ask someone higher in authority to make the decision than to simply do as told. 


DATA ENCRYPTION 

At the end the motive behind most of the attacks is to access data. One step to stop 
this from happening is to use a disk encryption software. What it does is that it will 
encrypt the specified files in our machine with a strong encryption method and make 
it password protected. In case even if the machine is compromised it would make it 
very difficult for the attacker to get the data. There are many such solutions available 
which provide this functionality such as BitLocker, TrueCrypt. It is advised to check 
if the software being used has no publicly known vulnerability in itself. Similarly it 
is advised to store and send all sensitive online data in encrypted format. 



Help protect your files and folders by encrypting your drives 


BitLocker Drive Encryption helps prevent unauthorized access to any files stored on the drives shown 
below. You are able to use the computer normally, but unauthorized users cannot read or use your files. 

What should I know about BitLocker Drive Encryption before I turn it on? 


BitLocker Drive Encryption - Hard Disk Drives 

G Turn On BitLocker 

5 ^* off 


D: 

Off 

1j§P Turn On BitLocker 


l/IIVC Ulkiypuvn IV \JV 

UUI (E:) 

Off 

Turn On BitLocker 


See also 

^ TPM Administration 
<9 Disk Management 

Read our privacy statement 
online 


FIGURE 11.5 


BitLocker Drive Encryption. 


Copyrighted material 



















PA217 


CHAPTER 


Basics of Social Networks 
Analysis 



INFORMATION IN THIS CHAPTER 

• Social network analysis 

• Gephi 

• Components 

• Analysis 

• Application of SNA 


INTRODUCTION 

In one of the recent chapters we have discussed about the importance of data man¬ 
agement and analysis. We also learned about certain tools which could be useful in 
the process. In this chapter we will deal with an associated topic, which is social 
network analysis (SNA). SNA is widely used in Information Science to learn vari¬ 
ous concepts. This is wide topic and has applications in many fields and in this 
chapter we will attempt to cover the important aspects of the topic and the tools 
required for it so that the readers can further utilize it depending according to per¬ 
sonal needs. 

As the name suggests, social network analysis is basically the analysis of social 
networks. The social network we are talking about is a structure which consists of 
different social elements and the relationship between them. It contains nodes which 
represent the entities and edges representing relationships. What this means is that, 
using SNA we can measure and map the relationships between various entities, these 
entities usually being people, computers, a collection of them, or other associated 
terms. SNA utilizes visual representations of the network for the purpose of better 
understanding it and implements mathematical theories to derive results. There are 
various tools that can be used to perform SNA and we will deal with them as and 
when required. 

Let’s deal with some basic concepts. 

NODES 

Nodes are used to represent entities. Entities are an essential part of social network 
as the whole analysis revolves around them. They are mostly depicted with a round 
shape. 


Hacking Web Intelligence. http://dx.doi.org/10.1016/B9784M2-801867-5.ft0012-4 217 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






218 CHAPTER 12 Basics of Social Networks Analysis 


EDGES 

Edges are used to represent the relationships. Relationships are required to establish 
how one node connects to another. This relationship is very significant as it helps to 
perform various analyses such as how information will flow across the network etc. 
The number of edges connected to a node defines its degree. If a node has three links 
to other entities, it has degree 3. 

NETWORK 

The network is visually represented and contains nodes and edges. Different param¬ 
eters of nodes and edges such as size, color etc., may vary depending upon the analy¬ 
sis that needs to be performed. 

Networks can be directed or undirected, which means that the edges might be 
represented as simple lines or as directed arrows. This primarily depends upon the 
relationships between the edges. For example, in a network of mutual connection 
such as friends, can have an undirected network but for a network of relations such 
as who likes whom can have directed network. 

Now we have a basic idea of SNA. Let’s get familiar with one of the most utilized 
tools for it. 


GEPHI 

Gcphi is a simple yet efficient tool used for the purpose of SNA. The tool can be 
downloaded from http://gephi.github.io/ and the installation process is pretty 
straightforward. Once installed, the tool is ready to be used. The interface is simple 
and is divided into different sections. There are three tabs present at the top left cor¬ 
ner which allow working with the network in different manner. These three tabs are 
Overview, Data Laboratory, and Preview. 


OVERVIEW 

The Overview tab provides the basic information about the network and displays 
the network visualization. It is primarily divided into three sections which further 
have subsections. The left-hand side panel consists of sections which allow parti¬ 
tioning and ranking of nodes and edges, performing different layouts for the net¬ 
work based on different algorithms. The middle section consists of the space where 
the network is visualized and the tools to work with the visualization. The right- 
hand sections contain information about the network such as number of nodes and 
edges and operations such as calculating the degree, density, and other network 
statistics. 


Copyrighted material 





PA219 



FIGURE 12.1 

Gephi Overview. 

DATA LABORATORY 

Under the Data Laboratory tab we can play with the data in its raw form. In this tab, 
the entities and their relationships are displayed in the form of a spreadsheet. Here 
we can add new nodes and edges, search for existing ones, import and export data, 
and much more. We can also work on columns and delete them, copy them, duplicate 
them etc. The data present can also be sorted depending upon different parameters by 
simply clicking on the row names. 



FIGURE 12.2 

Gephi Data Laboratory. 


Copyrighted material 











































































































220 CHAPTER 12 Basics of Social Networks Analysis 


PREVIEW 

In Preview tab we can change various settings related to the properties of the network 
graph such as the thickness of the edges, color of the nodes, border width etc. This helps 
us to set different values for different parameters so that we can make recognizable dis¬ 
tinction based on different properties of the graph. The settings can be made in the left- 
hand panel and the changes are reflected in the rest of the section available for preview. 

There are many other tools available for SNA, some of which are SocNetV (http:// 
socnctv.sourceforge.net/), NodeXL (http://nodcxl.codeplex.com/), EgoNct (http://source 
forge.net/projects/egonet/) etc. 

The term network is quite the same as we use in computers or any other aspect 
such as math or physics. The terminologies to get the proper definition might change 
in different areas of study but the bottom line is network which is the connection of 
different entities with relationships. As we discussed a bit about the network earlier, 
now it’s time to dig a bit on the same. To make a simpler approach to network we will 
use term “NODE” for entities and term “EDGE” for the relationship. 

To create a meaningful and easy to understand network or graphical representa¬ 
tion of a network, we must need to focus on certain areas such as highlight the widely 
used and important nodes and edges, remove nodes with no data or edges, remove 
redundant data, group similar nodes based on geographical location, community, 
or anything that broadly relates those. These are the basic practices or points to be 
remembered while creating a meaningful and easy to understand network. 

The components of a network such as the edge and the node have certain attri¬ 
butes based on that we can create a network. Those attributes play a vital role in 
understanding a network and its components better. Let’s start with a node. 

As discussed earlier, node has a property called degree. Degree can be used for 
calculating the likelihood of that node. It is nothing but the number of edges that are 
connected to the node. Though it also matters is that whether the edges are directed 
or undirected. Let’s say the number of directed edges toward a node X is 5 and the 
directed edges away from X is 2. Then the degree of X is 7 because it’s the combina¬ 
tion of in-degrec (5) + out-degree (2). 


NODE ATTRIBUTES 

Every node in a network can have a range of attributes that can be used to distinguish 
some properties of a node. 

The attribute can be in binary form to explain in simple true/false, yes/no, online/ 
offline, or married/unmarried. This is one of the easy representations of an node 
attribute where we have only choose one out of the two choices. 

The attributes can be set categorical based on if options available are more than 
two such as if we want to set an attribute to a node called as relationship then we 
can use different category as an option to it, e.g., 1. Friend, 2. Family, 3. Colleague. 

The attribute can also be set as continuous such as based on some of the informa¬ 
tion that cannot be same for every node. For example, date of birth, job position etc. 
We can also use the same as attributes of a node to distinguish as node quite easily. 


Copyrighted material 





PA221 


Edge attributes 


EDGE ATTRIBUTES 

DIRECTION 

Based on direction, two major types of edges can be found. 

1. Directed edges 

2. Undirected edges 

Directed edges 

Directed edges are the edges with unidirectional relationship. The best example of a 
directed edge is X -* Y. Here X is unidirectional related with Y. We can say that Y is 
a child of X or X loves Y or any such one-sided relationships. 

Undirected edges 

It can be used for establishing mutual relationships, such as X <-► Y or X — Y. 

The relationship can be anything likeX andY are friends or classmates or colleagues. 

TYPE 

It can be the type of relationship that put an edge in a group. Let’s say that there are 
different nodes and edges but if some of the edges are similar by the type let say a 
group then we can distinguish them quite easily. Type can be anything such as start¬ 
ing from friends, close friends, colleague, relative etc. And it has a significant role in 
differentiating different edges. 


WEIGHT 

It can be the number of connection that two nodes can have. For example, if X 4 —*Y 
shares more than one mutual/undirected or directed edge to each other then the 
weight of that edge is that number. For example, X relates with Y in five ways then 
the weight of that edge is 5. We can simply draw five edges between those two nodes 
or we can draw a deeper edge between them to make it easy to understand that these 
two nodes contain a higher weighted edge. 

Weight can be also of two types: 

1. Positive 

2. Negative 

Positive weight 

It’s based on the likelihood of a relationship. For easy understanding let’s talk about 
a politician. There are many people who like a particular politician. So the relation¬ 
ship they establish with the same will be the positive weight. 

Negative weight 

Similarly as we came across, the negativity or the hate or the unlikelihood can be also 
a factor in a relationship. That can be measured by the negative weight. 


221 


Copyrighted material 


222 CHAPTER 12 Basics of Social Networks Analysis 


RANKING 


Based on the priorities of the relationship established between two nodes, edges can 
have different rankings. Such as X’s favorite subject is Math, X’s second favorite 
subject is Physics. So to differentiate between these priorities ranking comes in to 
existence for easy understanding in a network. 


BETWEENNESS 


There are certain scenarios where we can see that there are two different group of 
nodes connected to each other by an edge. So those kind of edges perceive a unique 
quality to combine two different groups or set of nodes and that can be called as 
betweenness. There are many other attributes that can be found situational. For the 
time being we can say that we have basic knowledge of network, its components, and 
its attributes so that if in future we get a chance to create a network or understand a 
given network then we can understand at least the basics of it properly. 

The core basics of the network and about its components are covered above but 
still we haven’t covered the main topic that is SNA. 

As discussed earlier in the chapter, SNA is about mapping and measuring of rela¬ 
tionships between different entities. These entities can be people, groups, organiza¬ 
tions, systems, applications, and other connected entities. The nodes in the network 
are usually people but it can be anything based on what network we are looking at, 
while the links represent relationships or flows between the nodes. SNA provides 
both mathematical as well as graphical analysis of relationships using which an ana¬ 
lyzer can deduce number of conclusions such as who is a hub in the network, how 
different entities connected to each other, and why they connected to each other 
with a proper logical and data-driven answer. The factors that come into act such as 
degree, betweenness, and others are already covered. 



FIGURE 12.3 


A small sample network to understand different components. 


Copyrighted material 










Edge attributes 225 


Gatekeeper/Boundary spanners 

An entity who mediates or we can say controls the flow between one portion of the 
network with another, earlier we named it different, we used boundary spanner for it. 
These are just different keywords with same definitions or are the same. In our previ¬ 
ous example, “F9” and “F3” are the gatekeepers. 

Bridge 

It is the only edge which links/belongs to two or more groups. In the previous exam¬ 
ple, there are three bridges, (1) F9 F10, (2) F3 -> F4, (3) F3 -» F5. 

Liaison 

An entity which has links to two or more groups that would otherwise not be linked, 
but is not a member of either group. In our previous example, it does not have any 
such node that was in a position of Liaison. 


Isolate 


As the name suggests, isolate is an entity which has no links to other entities; gener¬ 
ally a linkless or edgeless node. In our previous example, we do not have any isolate 
nodes. 

So there are certain roles that are not present in our example, so here is a new 
network that contains all the roles and highlighted properly for easy understanding. 




FIGURE 12.4 

Network highlighting different roles. 

SNA can be helpful in many ways to understand the information flow, we can use 
the same in variety of situations such as predict exit poll results from the verdict of 
online users or identify how and to what extent an information will flow in a network 
of friends, understand an organizational culture, or even find the loopholes in a process. 

For a simpler example, to use it in generic scenario we can create a network of 
Twitter users of a community or organization and see who is following whom and 


Copyrighted material 







226 CHAPTER 12 Basics of Social Networks Analysis 


who is being followed. This would help us to understand who the key players are 
in that structure and create the most influence. Similarly we can also understand, 
who is more follower type and who arc leaders. In a network of professionals in an 
organization, it can be used to identify the people who create a hierarchy and how 
path would be better if one professional needs to connect to another one who is not 
a direct connection. 

Similarly it can be used to analyze a network of connected people to identify how 
a communicable disease would spread in the network and which links need to be bro¬ 
ken before the whole network gets infected. Another example could be in a network 
of market leaders of an industry to identify who is the hub in that network and needs 
to be targeted to be influenced for a decision to be taken. 

Most of the attributes and functions that we have discussed in the chapter can 
be automatically calculated using Gephi, it also has many algorithms which can be 
utilized to perform the layout, identify key elements, implement filters, and perform 
various other operations. It can also be extended utilizing various plugins option 
which is present under the tools button. 



FIGURE 12.5 

Sample network in Gephi with different values calculated. 

SNA is used by various social network platforms and organizations which deal 
with connection between people; similarly it has application in many domains which 
depend on information science to flourish their market. 

We learned something new in this chapter and can use the same in future for easy 
understanding of any complex system by creating a simpler network for that. Here 


Copyrighted material 










































PA228 


This page intentionally left blank 


Copyrighted material 



PA229 


Quick and Dirty Python 


CHAPTER 



INFORMATION IN THIS CHAPTER 

• Introduction to programming 

• Python intro 

• Python components 

• Examples and Samples 

• Creating tools and transforms 


INTRODUCTION 

After covering many interesting topics related to utilizing different automated tools, 
in this chapter we will be learning to create some. Sometimes there is a need to 
perform some specific task for which we are not able to find any tools which suits 
the requirements, this is when we have some basic programming knowledge so that 
we can quickly create some code to perform the desired operation. This chapter will 
touch upon the basics of Python programming language. We will understand why 
and how to use Python, what are the basic entities and then we will move on to cre¬ 
ate some simple but useful code snippets. It is advised to have some programming 
knowledge before moving on with this chapter as we will be covering the basic essen¬ 
tials related to the language and jump straight into the code. Though the examples 
used would be simple yet having some programming experience would be helpful. 

Anyone who has some interest in computer science is familiar with the concept of 
programming. In simple terms it is the process of creating a program to solve a prob¬ 
lem. To create this program we require to have a language using which we can write 
instructions for computer to understand and perform the task. The simple objective 
of a computer program is to automate a series of instructions so that they need not to 
be provided one by one manually. 

PROGRAMMING VERSUS SCRIPTING 

The language we are going to be discussing in this chapter is Python, which is commonly 
termed as a scripting language, so before moving further let’s understand what that 
means. Usually the code written in a programming language is compiled to machine code 
using a program called compiler to make it executable. For example, code written in C++ 
language is compiled to create an exe file which can be executed in a Windows platform. 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8018fi7-5.ft0013-6 229 

Copyright © 2015 Elsevier Inc. All rights reserved. 


Copyrighted material 






230 CHAPTER 13 Quick and Dirty Python 


There is another program called as an interpreter which allows running a language code 
without being compiled. So if the execution environment for a piece of code is an inter¬ 
preter it is a script. Usually Python is executed in such environment and hence is com¬ 
monly called a scripting language. This does not mean that a scripting language cannot 
be compiled, it simply is not usual. All scripting languages are programming languages. 


INTRODUCTION TO PYTHON 

Python is a high-level programming language created by Guido Van Rossum, which 
emphases on the readability of code. Python is very fast and allows solving problem 
with minimum amount of code and hence is very popular among people who need 
to create quick scripts on the go, such as pentesters. There are various versions of 
Python but we will be focusing on the 2.7 version in this chapter. Though the latest 
version as of now is 3.4, yet most of the Python tools and libraries available online 
are based on the 2.7 version and the 3.x version is not backward compatible and 
hence we will not be using it. There are some changes in 3.x version but once we get 
comfortable with 2.7 it won’t require much effort to move to it, if required. 

The main agenda behind this chapter is not to create a course on Python that 
would require a separate book in itself. Here we will be covering the basics quickly 
and then move on to creating small and useful scripts for general requirements. The 
aim is to understand Python, write quick snippets, customize existing tools, and cre¬ 
ate own tools as per requirements. This chapter strives to introduce the possibili¬ 
ties of creating efficient programs in a limited period of time, provide the means to 
achieve it, and then further extend it as required. 

There are other alternatives to Python available, mainly Ruby and Perl. Perl is one 
of the oldest scripting languages and Ruby is being widely used for web develop¬ 
ment (Ruby on Rails) yet Python is one of the easiest and simplest language when 
it comes to rapidly creating something with efficiency. Python is also being used for 
web development (Django). 

INSTALLATION 

Installing Python in Windows is pretty straight forward, simply download the 2.7 
version from https://www.python.org/downloads/ and go forward with the installer. 
Linux and other similar environments mostly come preinstalled with Python. 

Though, it is not mandatory yet highly recommended to install Setuptools and Pip 
for easy installation and management of Python packages. Details related to Setuptools 
and Pip can be found at https://pypi.python.org/pypi/setuptools and https://pypi.python. 
org/pypi/pip respectively. 

MODES 

We can run Python basically in two ways, one is to directly interact with the interpreter, 
where we provide the commands through direct interaction and see the output of it (if 
any) and other one is through scripts, where we write the code into a file, save it as 


Copyrighted material 




232 CHAPTER 13 Quick and Dirty Python 


IDENTIFIERS 

In programming, identifiers are the names used to identify any variable, function, 
class, and other similar objects used in a program. In Python, they can start with 
an alphabet or an underscore followed by alphabets, digits, and underscore. They 
can contain a single character also. So we can create identifiers accordingly, except 
certain words which are reserved for special purposes, for example, “for,” “if,” “try,” 
etc. Python is also case sensitive which means “test” and “Test” are different. 


DATA TYPES 

Python has different variable types, but is decided by the value passed to it and does 
not require to be stated explicitly. Actually the data type is not associated with the 
variable name but the value object and the variable simply references to it. So a vari¬ 
able can be assigned to another data type after it already refers to a different data 
type. 


C:\Python27>python 

Python 2.7.3 <default, Apr 10 2012, 23:24:47> [MSC v.1500 64 bit <AMD64>] on win 
32 

Type "help", "copyright", "credits" or "license" for more information. 

>>> test=10 
>>> test 
10 

\»> test="This is a test" 

>>> test 

'This is a test' 

»> 


FIGURE 13.2 

Value assignment. 

Commonly used data types are: 

• Numbers 

• String 

• Lists 

• Tuples 

• Dictionaries 

To define a number simply assign a variable with a number value, for example, 

>>>samplenum=10 

Just to know there are various types of numerical such as float, long, etc. 

To define a string we can use the help of quotes (both single and double), for 
example, 

>>>samplestr=”This is a string” 

>>>samplestr2=’This is another string’ 

We can also utilize both the types of quotes in a nested form. To create multiline 
strings we can use triple quotes. 


Copyrighted material 








PA236 


236 CHAPTER 13 Quick and Dirty Python 


single code. The examples shown in the following chapter will work on this concept 
and we will be using spaces. 

Basic terms (class, function, conditional statements, loops, etc.) 

Now let’s move forward with conditional statements. 

The most basic conditional statement is “if.” The logic is simple, if the provided 
condition is it will execute the statement, else it will move on. Basic structure of “if’ 
and associated conditions is shown below. 

if condition: 

then_this_statement 
elif condition: 

then_this_statement 
el se: 

this_condition 

Example code 

#!/usr/bin/python 

a-10 

b-12 

c-15 

if (a==b): 

print “a=b” 
el if (b==c): 

print “b=c” 
el if (c==a): 

print “c=a” 
el se: 

print “none” 

Write this in a notepad file and save it as if_con.py. This code will result in the 
response “none,” when executed in Python. The “elif” and “else” conditions are not 
mandatory when using “if’ statement and we can have multiple “elif’ statements. 
Similarly we can also have nested “if’ conditions where there will be if statements 
within another if statement, just proper indentation needs to be kept in mind. 

if condition: 
then_this_statement 
if nested_condition: 

then_this_nested_statement 
else nested-else_condition: 
then_this_nested-else_statement 

The “while” loop is next in line. Here we will provide the condition and the loop will 
run until that condition is true . Structure of “while” is shown below. 

while this_condition_statement_is- true: 
run_this_statement 


Copyrighted material 


Introduction 237 


Example code 

#! /usr/bin/python 

a=10 

c=15 

while (a<c): 
print a 
a=a+l 


Output 

10 

11 

12 

13 

14 

Wc can also utilize “break” and “continue” statement to control the flow of the loop. 
The “break” statement is used to break out of the current loop and the “continue” 
statement is used to pass the control back to the starting of the loop. There is one 
more interesting statement called “pass” which does nothing, in particular is used 
just as a placeholder. 

Another useful conditional statement is “for” loop. Using it we can iterate through 
the items present within an object such as a tuple or list. 

Example code 

#! /usr/bin/python 
sample_tup=(‘23 *,'test *,12, f w2') 
for items in sample_tup: 
print items 

Output 

123 

test 

12 

w2 

We are simply passing the individual values in the tuple sample_tup and putting them 
inside the variable items one by one and printing them. 

Example code 

#! /usr/bi n/python 
str=“Stri ng” 
for items in str: 
print items 


Copyrighted material 


240 CHAPTER 13 Quick and Dirty Python 


Output 

First Function 
Argument Return 
Second Function 
Argument Return 

Here the function_init_is the constructor of the class and is the first function 

which runs in the class. The variable “classobj” is the object for the class “sample^ 
class” and using it we can communicate with the objects inside the class. As dis¬ 
cussed earlier we can also create this as a module and call it inside another program. 
As discussed earlier, let’s take another example of importing modules. 

Example code 

#!/usr/bin/python 
class sample_class: 

def _init_(self, classarg): 

self.cla=classarg 
def firstfunc(sel f): 
print “First Function” 
return self.cla+“ Return” 
def secfunc(sel f): 
print “Second Function” 
return self.cla+“ Return” 
classobj=sample_class(“Argument”) 

This file is being saved as mod.py and another file calls this as a module with the code: 

//! /usr/bi n/python 
from mod import * 
print classobj.firstfunc() 

Output 

First Function 
Argument 

In Python we can also create directory of modules for better organization through 
packages. They are hierarchical structures and can contain modules and subpackages. 

WORKING WITH FILES 

Sometimes there is a need to save or retrieve data from files for this we will learn how 
to deal with files in Python. 

First of all, to open a file we need to create an object for it using the function open 
and provide the mode operation. 

>>>sample_fi1e=open(‘text.txt *,“w”) 

Here the name sample_file is the object and using open function we are opening the 
file text.txt. If the file with this name does not already exists it will be created and if 
already exists it will be overwritten. The last portion inside the parenthesis describes 


Copyrighted material 


Introduction 241 


the mode, here it is w which means write mode. Some other commonly used modes 
are “r” for reading, “a” for append, “r+” for both read and write without overwriting, 
and “w+” for read and write with overwriting. 

Now we have created an object so let’s go ahead and write some data to our file. 

>>>sample_fi1e(“test data”) 

Once we are done with writing data to the file we can simply close it. 

>>>sample_fi1e.close() 

Now to read a file we can do the following: 

>>>sample_fi1e=open(‘text. txt ’,“r”) 

>>>sample_fi1e.read() 

‘test data’ 

>>>sample_fi1e.close() 

Similarly we can also append data to files using “a” mode and writeQ function. 
Python has various inbuilt as well as third party modules and packages which 
are very useful. In case we encounter a specific problem that we need to solve using 
Python code it is better to look for an existing module first. This saves a lot of time 
figuring out the steps and writing huge amount of code through simply importing the 
modules and utilizing the existing functions. Let’s check some of these. 

Sys 

As stated in its help file this module provides access to some objects used and main¬ 
tained by interpreter and functions that strongly interact with it. 

To use it we import it into our program. 

import sys 

Some of the useful features provided by it are argv, stdin, stdout, version, exit(), etc. 


Re 

Many times we need to perform pattern matching to extract relevant data from a 
large amount of it. This is when regular expressions are helpful. Python provides “re” 
module to perform such operations. 

import re 


Os 

The “os” module in Python allows to perform operating system-dependent 
functionalities. 

import os 

Some sample usages are to create directories using mkdir function, rename a file 
using rename function, kill a process using kill function and display list of entries in 
a directory using listdir function. 


Copyrighted material 


PA243 


Introduction 


When executing this code, it will prompt the message “Enter something”, once we 
input the value it will generate the response accordingly. For an input value “a” it will 
generate the output “aaaa”. 

COMMON MISTAKES 

Some common issues faced during the execution of Python code are as follows. 

Indentation 

As shown in examples above. Python uses indentations for grouping the code. Some 
people use spaces for this and some use tabs. When running the code written by some 
person or modifying it we sometimes face the indentation error. To resolve this error, 
check the code for proper indentation and correct the instances; also make sure to not 
mess up by using tabs as well as spaces in the same code as it creates confusion for 
the person looking at the code. 

Libraries 

Sometimes people have a completely correct code, yet it fails to execute with a library 
error. The reason is missing of a library that is being called in the code. Though it 
is a novice mistake, sometimes experienced people also don’t read the exact error 
and start looking for errors in the code. The simple solution is to install the required 
library. 

Interpreter version 

Sometimes the code is written for a specific version of the language and when being 
executed in a different environment, it breaks. To correct this, install the required ver¬ 
sion and specify it in the code as shown earlier in this chapter or execute the code using 
the specific interpreter. Sometimes there are multiple codes which require different ver¬ 
sions; to solve this problem we can use virtualenv, which allows us to create an isolated 
virtual environment where we can include all the dependencies to run our code. 

Permission 

Sometimes the file permissions are not set properly to execute the code so make the 
changes accordingly using chmod. 

Quotes 

When copying code from some resources such as documents and websites there is a 
conversion between single quote (‘) and grave accent (') which causes errors. Iden¬ 
tify such conversions and make the changes to the code accordingly. 

So we have covered basics about the language let’s see some examples which can 
help us to understand the concepts and understand their practical usage and also get 
introduced to some topics not discussed above. 

Similar to shodan, discussed in a previous chapter there is another service 
called zoomeye. In this example we will be creating a script using which will query 


243 


Copyrighted material 


246 CHAPTER 13 Quick and Dirty Python 


extracts data in the form of entity (or entities) based upon the relationship. Maltcgo 
has a lot of inbuilt transforms and keeps on updating the framework with new ones, 
but it also allows to create new ones and use them, this can be very helpful when we 
need something custom according to our needs. 

Before we move any further we need the “MaltegoTransform” Python library by 
Andrew MacPherson, which is very helpful in local transform development. It can 
be downloaded from the page https://www.paterva.com/web6/documentation/develo 
per-local.php. Some basic examples of local transforms created using the library are 
also present at the bottom of the page. Once we have the library in our directory we 
are ready to go and create our own first transforms. 

To create any program first we need to have a problem statement. Here we need 
to create a transform so let’s first identify something that would be helpful during 
our OSINT exercise. There is a service called as HavelBeenPwned (https://haveibee 
npwned.com) created by Troy Hunt which allows users to check if their account has 
been compromised in a breach. It also provides an application programming inter¬ 
face (API) using which we can perform the same function. We will be using the vl 
of the API (https://haveibeenpwned.com/API/vl) and provide an e-mail address to 
check if our supplied e-mail has any account associated. 

To utilize the API we simply need to send a GET request to the service in the 
form shown below and it will provide a JSON response to show the website names. 
https://haveibeenpwned.com/api/breachedaccount/{ account} 

Let’s first specify the path of the interpreter 

//! /usr/bi n/python 

Now we need to import the library MaltegoTransform 
from MaltegoTransform import * 

Once we have the main library we need to import some other libraries that will 
be required. Library “sys” is to take user input and urllib2 to make the GET request. 

import sys 
import ur11ib2 

Once we have imported all the required libraries, we need to assign the function 
MaltegoTransform() to a variable and pass the user input (e-mail address) from 
Maltego interface to it. 

mt = MaitegoTransform() 
mt.parseArgumentsC sys.argv) 

Now we can pass the e-mail value to a variable so that we can use it to create the 
URL required to send the GET request. 

email=mt.getVal ue() 

Let’s create a variable and save the base URL in it. 

hibp=“ https : //haveibeenpwned . com/a pi/breachedaccount/” 


Copyrighted material 




248 CHAPTER 13 Quick and Dirty Python 


::\Python27>emailhibp.py fooPbar.com 
MaltegoMessage> 

MaltegoTransf ornResponseMessage> 

Entities> 

Entity Type- M naltego.Phrase M > 

Ualue>Pwned at [''Adobe”,"Gawker”,”Stratfor"K/Ualue> 
We ight >100</We ight > 

/Entity> 

/Entities> 

UIMessages> 

/UIhessages> 

/MaltegoTransfornResponseMessage> 

/MaltegoMessage> 


FIGURE 13.7 

Transform output. 

We can see that the response is a XML styled output and contains the string 
“Pwned at [“Adobe”,“Gawker”,“Stratfor”]”. This means our code is working prop¬ 
erly and we can use this as a transform. Maltego takes this XML result and parses it 
to create an output. Now our next step is to configure this as a transform in Maltego. 

Under the manage tab go to Local Transform button to start the Local Transform 
Setup Wizard. This wizard will help us to configure our transform and include it into 
our Maltego instance. 

In the Display name field provide the name for the transform and press tab, it will 
generate a Transform ID automatically. Now write a small description for the trans¬ 
form in the Description field and the name of the Author in the Author field. Next we 
have to select what would be the entity type that this transform takes as input, in this 
case it would be Email Address. Once the input entity type is selected we can choose 
the transform set under which our transform would appear which can also be none. 



FIGURE 13.8 

Transform setup wizard. 


Copyrighted material 
































PA249 


Introduction 


Now click on next and move to the second phase of the wizard. Here under the 
command field we need to provide the path to the programming environment we are 
going to use to run the transform code. In our case it would be 

/usr/bin/python (for Linux) 

C:\Python27/python.exe (for Windows) 

Once the environment is set we can move to the parameters field, here we will 
provide the path to our transform script. For example, 

/root/Desktop/transforms/emai1 hi bp.py (for Linux) 

C:\Python27\transforms\emailhibp.py (for Windows) 

One point to keep in mind here is that if we select the transform file using the 
browse button provided in front of the “Parameters” field, then it will simply take 
the file name in the field, but we need absolute path of the transform to execute it so 
provide the path accordingly. 



FIGURE 13.9 

Transform setup wizard. 


After all the information is filled into the place we simply need to finish the wiz¬ 
ard and our transform is ready to run. To verify this, simply take an e-mail address 
entity and select the transform from the right click menu. 


249 


Copyrighted material 




























PA250 


250 CHAPTER 13 Quick and Dirty Python 



Penetration ... 
^ Personal 




ijj Run Transform » 

All Transforms ► 

Send to URL 



0 Run Machine » 

Helat cd Email addresses ► 

To Domain [DNS] “ re 

Document 


Copy to New Graph ► 

Other transforms ► 

To Email Addresses [PGP (signed)] 

** A doiufrwfil on ll 


2 Change Type ► 

All transforms 

To Email Addresses (PGP) “ w» 

Email Address 


Merge 


To Email Addresses [using Search Engine] w» 





lo Person [IM,I>] “r. 

^ Image 


Attach 


To Person [Parse separator] re 

J A vi»u.J r«vi«.«r 




To Phone number (using Search Engine) !“ we 

. Person 


Type Actions ► 


To URLs (Show search engine results) O 

* rapracantt 


Copy 


To Website [using Search Engine] 

t Phone Numbei 


Copy (as List) 


Verify email address exists [SMTP) ~ 9 

* A 


Cut 


dgfd 

q Phrase 


Delete 


emailToTlickr Ac count re 

U An. t..t o. part 



emaifToMyspaceAccourit *7 iff 

«■ Social Network 



emallhibp 




searchPastebinsT orEmail re 




All transforms 


qj DetaJ View 


o 


Email Address 

miltfgtmtMam 

foo@bar.com 


Property view 
9 Properties 


FIGURE 13.10 

Select transform. 



FIGURE 13.11 

Transform execution. 

Now we have created our first transform and also learned how to configure it in 
Maltego. Let’s create another simple transform. For this example we will be using 
the website http://www.my-ip-neighbors.com/. It allows to perform a reverse IP 
domain lookup, simply said the domains sharing the same IP address as the one of 
the provided domain. As in the previous transform we provided an e-mail address as 
the input here we require a domain name, but this website provides no API service 
and hence we will have send the raw GET request and extract the domains out of the 
web page using regular expressions through the library “re”. 


Copyrighted material 





























































Introduction 251 


#!/usr/bin/python 

from MaitegoTransform import * 

import sys 

import urllib2 

import re 

mt = MaitegoTransformC) 
mt.parseArguments(sys.argv) 
url=mt.getValue() 
mt = MaitegoTransform() 

opencnam="http : //www.my-i p-neighbors .com/?domai n=” 

getrequrl=opencnam+url 

header={‘User-Agent *:*Mozi11a’} 

req=url1i bZ .Request(getrequrl .None,header) 

response=url1ib2.urlopen(req) 

domains=re.findall(“((?:[0-9]*[a-z][a-z\\.\\d\\-]+)\\.(?: 
[0-9]*[a-z][a-z\\-] + ))(?![\\w\\.])”, response.readO) 
for domain in domains: 

mt.addEntity(“maltego.Domain”, domain) 
mt.returnoutput() 

♦http://txt2re.com/ can be used to create regular expressions. 

Similarly we can create lot of transforms which utilize online services, local tools 
(e.g., Nmap scan), and much more using Python. The examples shown above and 
some more can be found at https://github.com/SudhanshuC/Maltego-Transforms. 
Some other interesting transforms can be found at https://github.com/cmlh, else they 
are just a quick Github search away (https://github.com/search7utf8s?? E29? 9C9? 93 
&q=maltego+transform). 

There is also a Python-based framework available, which allows creating Maltego 
tranforms easily called as Canari (http://www.canariprojcct.com/). 

There are various topics which we have not covered but the scope is limited and 
topic is very vast. Some of these are exception handling, multiprocessing, and mul¬ 
tithreading. Below there are some resources which can be helpful in this quest of 
learning Python. 


RESOURCE 

https://github.com/dloss/python-pentest-tools 

A great resource to learn more about Python and its usage is the Python docs 
itself https://docs.python.Org/2/. Another great list of Python-based tools with focus 
on pentesting is present at https://github.com/dloss/python-pentest-tools. It would be 
great to create something interesting and useful by modifying, combining, and add¬ 
ing to the mentioned resources. The list is divided into different sections based on the 
functionality provided by the tool mentioned. 

So we have covered some basics of Python language and also learned how to 
extend Maltego framework through it. Through this chapter we have made an attempt 
to learn about creating own custom tools and modify existing ones in a quick fashion. 


Copyrighted material 



252 CHAPTER 13 Quick and Dirty Python 


This chapter is just an introduction of how we can simply create tools with minimum 
amount of coding. There is certainly room for improvement in the snippets we have 
shown in functional as well as structural terms, but our aim is to perform the task as 
quickly as possible. 

Though we have tried to cover as much ground as possible yet there is so much 
more to learn when it comes to Python scripting. Python comes with a large set of 
useful resources and is very powerful; and by using it one can create power tool- 
set, recon-ng (https://bitbucket.org/LaNMawSteR53/recon-ng) is great example of it. 
We have discussed about this Reconnaissance framework in a previous chapter. One 
great way to take this learning further would be to practice more and create such tools 
which could be helpful for the community and contribute to the existing ones such 
as recon-ng. 

Slowly we are moving toward the end of this journey of learning. We have been 
through different aspects of intelligence gathering in different manners. Moving on 
we will be learning about some examples and scenarios related to our endeavor, 
where we can utilize the knowledge we have gained in a combined form. 


Copyrighted material 




PA253 


CHAPTER 


Case Studies and 
Examples 



INFORMATION IN THIS CHAPTER 

• Introduction 

• Case studies 

• Example scenarios 

• Maltego machines 


INTRODUCTION 

After working with so many tools and techniques and going through so many pro¬ 
cesses of information gathering and analysis, now it’s time to see some scenarios and 
examples where all this comes together for practical usage. In this chapter we will 
include some real scenarios in which we or people we know have used OSINT (open 
source intelligence) to collect the required information from very limited informa¬ 
tion. So without wasting any time let’s directly jump into case study 1. 


CASE STUDIES 

CASE STUDY 1: THE BLACKHAT MASHUP 

One of our friends returned from Black Hat US conference and he was very happy 
about the meetings and all. Our friend works for a leading security company and 
takes care of US sales. He was very excited about a particular lead he got there. The 
person he met there was in senior position of a gaming company and interested in the 
services offered by our friend’s company. They had a very good networking session 
in the lounge while having drinks and in excitement he forgot to exchange the cards. 
So he had to find the person and send him the proposal that he committed. 

• Problem No. 1: He forgot his full name but remembers his company name and 
location. 

• Problem No. 2: While discussion the other person said that as many people 
approach him for such proposals, he uses a different name on Linkedln. 

• Problem No. 3: We know the position of the other person, but it is not a unique 
position such as CEO or CTO. 

Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0.12-8ni8fi7-5.ft0ni4-8 
Copyright © 2015 Elsevier Inc. All rights reserved. 


253 


Copyrighted material 







254 CHAPTER 14 Case Studies and Examples 


When he came to us with this case we had a gut feeling that we can find him. 
We have some sort of information about the person though that does not include any 
primary information such as e-mail address or full name. 

These are the steps we followed. 

The first thing we asked him was that whether he can recognize the person’s picture 
or he forgot that also, to our good luck he said “yes.” So the major point in our case was 
that if we will find some profiles of persons he can validate and confirm the right person. 

Step 1: 

As usual we first started with a simple Google query with the first name, his posi¬ 
tion, and company name. Let’s say his position is senior manager and company name 
is abc.inc. The query we used is 

Senior manager abc.inc 

Step 2: 

We tried the same in Facebook to get a profile equivalent to that but never the less 
no leads. 

Step 3: 

We went to Linkedln tried that simply and failed. 

Step 4: 

We went to that company profile page and tried to visit all the employees’ profiles 
but we found that there are more than 7000 registered employees therein Linkedln 
and it’s really a tough task to find anyone from there. 

Step 5: 

As we covered in Chapter 2 Linkedln provides an advanced search feature and we 
have some of the direct data for the fields. So we have decided to use that. 

https://www.linkedin.com/vsearch/p?trk=advsrch&adv=true 

We filled data in the fields such as title, company, and location as our friend has 
these information. In result we got many equivalent profiles but this time we got 
far less results which we went through one by one and finally found the person in 
twenty-first result as he had shared that he has recently been to the conference. After 
visiting his profile we got a bit more details about that person and our friend con¬ 
firmed that we got what we were looking for. 

What could have been done after this? 

We might get his primary e-mail id, company e-mail id, and other details using 
different sources such as Maltego or simple Google. Using the image we might go 
for a reverse image search to get related image and the sources. We might get the 
blogs or websites created by that person and many more. There were endless pos¬ 
sibilities but we stopped there because for the time being that was out of our scope. 
We sent the link to our friend, who then sent him connection request and later got the 
deal and so we got a small treat. 


Copyrighted material 




258 CHAPTER 14 Case Studies and Examples 


Step 1: 

As we had the name of the person and company name we directly searched the 
person in Linkedln. We found his profile and the profile consists of many information 
such as his current and previous work experience. We found that the person is one of 
the technical leads of that company. The Linkedln profile also consists of some of his 
articles and latest achievements. The person recently got OSCP (Offensive Security 
Certified Professional). We also found the GitHub account link from the Linkedln 
profile. We visited each of his articles, and most of the articles were about how he 
found some of the bugs in many major sites which consist of some of the zero-days 
in popular CMS system. 

Step 2: 

After getting these we visited his GitHub account. He wrote all his scripts to 
automate the testing process in Python. 

Step 3: 

We did a simple Google search on his name and got many links along with a 
slidcshare account. We visited that slidesharc account. There were some presenta¬ 
tions on how to write your own IDS rules. In one of the older posts we found a com¬ 
ment link to his older blogs. 

Step 4: 

We visited that old blog of him which consists of different road trips he did with 
his bullet motorcycle. 

Step 5: 

We searched his Twitter account and found an interesting post that he was recently 
attending one of the popular security conferences in Goa, India. 

Step 6: 

We visited the conference site and found that the person Mr. John Doe had given 
a talk on network monitoring. 

Step 7: 

We searched him in Facebook and got information about his hometown, current 
location, educational details, and all. 

Step 8: 

A quick people search on Yasni provided us a link to another website of a 
local security community where we found his phone number as he was the chap¬ 
ter leader. We verified this phone number through Truecaller and it checked out 
right. 


Copyrighted material 


Case studies 259 


It almost took 25-30 min to which we stopped digging more. In the mean¬ 
while our friend was ready with all the information he got from the company 
website. Based on the information we collected in different sources we con¬ 
cluded this. 

• Save the phone number and greet him with his name in case he calls. 

• First thing you need to ask him is that how his talk went in Goa. 

• Read his talk abstract and tell him it was great and you are regretting that you 
missed the talk. 

• Expect questions on IDS rule writing and can refer to the slideshare presenta¬ 
tions for answer. 

• Expect some questions on tool automation and that too in Python. So a quick 
Python revision was required. 

• Expect some questions on network penetration testing as he recently did 
OSCP. 

• Expect some questions on web application security, bug bounties, and zero-days 
as he got listed in many. 

• If he asks you about hobbies, tell him your road trips and how you wanted to 
have a bullet motorcycle but haven’t got it yet. 

• If you get any question related to your vision and all tell him something 
related to the company’s existing vision aligned with your personal 
thoughts. 

• If he asks you where you see yourself after some years or future plans, tell him 
you want to go for OSCP certification and be a Red team leader. This was true 
anyway. 

Our friend had a very good knowledge and experience in pentesting and 
because of his expertise and with a little homework on the company and on the 
background of the person, he got selected. Mr. John Doe was not only happy with 
his technical skills but also for the reason that he and our friend got many things 
in common. 

So these were some interesting case studies, we have certainly added as well as 
subtracted some points here and there as required but all in all these are the kind of 
situations everyone faces. Let’s learn about some basic types of information related 
to commonly encountered entities and how to deal with them. 

It’s quite easy to start with primary information such as name, e-mail id to collect 
all other information but there are cases where we might not have primary informa¬ 
tion but that does not mean we cannot get these primary information from secondary 
information. The process might be a bit difficult but it’s possible. So now we will 
discuss in particular about a person’s details. What can be collected about a person? 
Where and how? 

Below are some of the information we might be interested to collect about a 
person. 


Copyrighted material 


260 CHAPTER 14 Case Studies and Examples 


PERSON: 

• First Name 

• Last Name 

• Company Name 

• E-mail Address (Personal) 

• E-mail Address (Company) 

• Phone Number (Personal) 

• Phone Number (Company) 

• Address (Home) 

• Address (Company) 

• Facebook Account URL 

• Linkedln Account URL 

• Twitter Account URL 

• Flickr Account URL 

• Personal Blog/Website URL 

• Keywords 

• Miscellaneous 


From above list we can start with any point and can gather most of the rest. 
The steps may differ from what wc got as a source and how to get to others one 
by one but we will be using the same tools/techniques just in different order. 

Whatever we take as a source, basically we need to start with simple Google 
search or any other popular traditional search engine search such as Yandex. If we get 
any relative information, use the same information to collect any of the other related 
information by treating it as the source. 

Let’s say we got simple name that contains first name and last name then we can 
simply use a Google query to get results. Let’s say using Google we were able to get 
the personal blog or website. 

Visit that site to search related information such as any details about the person it 
may be the area of interest, age, date of birth, e-mail, hometown, educational details, 
or any such information that can be used to get other details. 

Let’s say we got the educational details. Open Facebook and try to search for 
the name with educational details. We may get the person’s profile. In Facebook we 
will get lots of information such as company he/she is working on, friends, pictures 
of him/her, and sometime other profile links along with the personal e-mail address 
also. 

Now using the company name and person’s name we can get the Linkedln profile 
quite easily and can craft an e-mail address. Generally companies use a typical pat¬ 
tern to create e-mail addresses. Let’s say the company name is abc.inc and the site is 
www.abc.com, and it uses the pattern, first letter of the first name and the last name 
without any spaces. So from the person’s name and company name we can easily 
craft the e-mail address or we can use tool like harvester to harvest e-mail address 
from the company domain name and after looking at all the e-mails we can easily 
pick the e-mail associated with the person. In this way it is possible to get or collect 
information through correlation. 


Copyrighted material 







262 CHAPTER 14 Case Studies and Examples 


DOMAIN: 

• Domain 

• IP Address 

• Name Server 

• MX server 

• Person 

• Website 

• Subdomains 

• E-mail Samples 

• Files 

• Miscellaneous 


So as we discussed earlier let’s take the domain as a primary entity and from that 
we want to get all other information that is mentioned above. If we wanted to simply 
get the IP address of that domain, we just need to run a simple ping command in com¬ 
mand prompt or terminal based on the operating system we use. 

ping <domain name> 

This command will execute and will provide the IP address of the domain. 

For other domain-specific information, there are domain tools freely available 
in the internet and from the Whois record, we will get different information such 
as registered company name, name server details, registered e-mail ids, IP address, 
location, and many more. Resources like w3dt.net can be very helpful here. Directly 
using domain tool we can find lots of information about a domain, or else we can use 
different domain specific Maltego transformations for the same. 

We can also use harvester to collect subdomains, e-mail address etc., from a 
domain name. From the e-mail addresses we can search for the profiles of the persons 
in different social networking sites. 

And to get subdomains and a particular file from the domain we can use search- 
Diggity, Google, Knock (Python tool). 

To get different subdomains we can use site operator or create a Python script 
which will take subdomain names from our list and enumerates it along with the 
domain provided. 

site:domainname 

To get a particular type of file from the domain with a keyword we can use filetype 
or ext operator and can run below query, 

site:domainname keyword filetype:ppt 

So in this way we can get all the domain-specific information from different 
sources. 

So these were some case studies and examples in which OSINT can be collected 
and be helpful in our personal and professional life. 

As promised earlier in our next topic we will be learning about Maltego machines. 


Copyrighted material 







PA263 


Case studies 


MALTEGO MACHINES 

We have covered various aspects of Maltego in previous chapters from understand¬ 
ing the interface to creating local transforms. As this chapter is more about combin¬ 
ing the knowledge we have covered till now, so related to Maltego we will learn 
how to create Maltego machines. Although we have already defined what a Maltego 
machine is, yet for quick recall it programmatically connected a set of transforms. It 
allows us to take one entity type as input and move toward another type(s) which are 
not directly connected to it, but through a sct/sequcnce of transforms. There arc some 
inbuilt machines in Maltego such as Company Stalker which takes domain entity as 
input and runs various transform in sequential fashion to get different types of infor¬ 
mation from it such as e-mail address, files etc. 



Maltego “Company Stalker” Machine 


To create our own machine we need to use Maltego Scripting Language (MSL). 
The documentation for MSL is available as a PDF at http://www.paterva. 
com/MSL.pdf. The documentation is clear and simple, and anyone having basic 
programming skills can easily understand it. As all the terms and process are 
clearly described we do not need to cover them again, so straight away jump to 
create our own simple machine using local transforms we learned to create in a 
previous chapter. 

Creating a Maltego machine is pretty simple, first we need to go to the Machines 
tab, under which we can find the New Machine option. Clicking on it will bring a 
window where we need to provide the name and other descriptive details related 


263 


Copyrighted material 























264 CHAPTER 14 Case Studies and Examples 


to the machine we are going to create. In the next step we need to choose type of 
machine we are going to create. For this we have three options: 

• Macro: runs once 

• Timer: runs periodically until stopped 

• Blank: a blank template 



FIGURE 14.2 

Create Maltego Machine 

Once we have selected the machine type we can write the code for our machine 
and include transforms in it at the appropriate position, from the right-hand side 
panel by selecting the transform and double clicking on it. The “start” block contains 
the transforms and all other execution part. The “run” functions are used to execute a 
specific transform. To run functions in a parallel fashion we can include them inside 
the “paths.” Inside “paths” we can create different “path” which will run in parallel 
with each other but the operations inside a path will run sequentially. Similarly we 
can provide different values, take user inputs, use filters etc. 

Let’s create a simple machine which extracts e-mail ids from a provided domain 
and further runs our H1BP local transform on these. For this we need to provide 
the machine name and select the macro type machine. Next we need to include the 
inbuilt transforms which can extract e-mails from domain such as domain to e-mail 
using search engine, Whois etc. Next we need to include our local HIBP transform. 
As we need to run these in parallel we need to create separate “path” for each e-mail 
extraction transform. Our final code looks like this: 

machine(“sudhanshuchauhan.domaintoHIBP”, 
displayName:“domaintoHIBP”, 
author:“Sudhanshu”, 


Copyrighted material 









































PA265 


Case studies 


description: “Domain name to HavelBeenPwned”) { 

start { 

paths{ 
path! 

run(“paterva.v2.Domai nToEmai1Address_AtDomain_SE”) 
run(“sudhanshuchauhan.emai1 hi bp”) 

) 

path { 

run(“paterva.v2.DomainToEmai1Address_SE”) 
run(“sudhanshuchauhan.emai1 hi bp”) 

I 

path { 

run(“paterva.v2.DomainToEmai1 Add ress_Whoi s”) 
run(“sudhanshuchauhan.emai1 hi bp”) 

} 

path { 

run( “paterva.v2.DomainToEmai1Address_PGP”) 
run(“sudhanshuchauhan.emai1 hi bp”) 

) 

} 


I 



FIGURE 14.3 

Our Maltego machine output. 

Two important things that need to be kept in mind are that our local transform must 
be integrated into Maltego before creating the machine and the input and output data 
types need to be taken care of when creating a sequence. 


265 


Copyrighted material 


































266 CHAPTER 14 Case Studies and Examples 


So we learned to create Maltego machine. Though there is still much more to 
explore and learn related to Maltego, we have attempted to touch upon its every 
important aspect. 

In this chapter we have learned about combining all the knowledge we gained 
till now and also saw some practical scenarios and examples. This is important as 
in real-life projects. It’s not just about knowing things but also about implementing 
and utilizing them in an integrated manner according to the situation and generating 
a fruitful outcome. 

In our next and last chapter we will be learning about certain general topics 
related to the internet which are often connected directly or indirectly to the infor¬ 
mation gathering. Having a basic understanding of these terms will be helpful for 
anyone utilizing internet for investigative purpose. 


Copyrighted material 


PA267 



Related Topics of Interest 


15 


INFORMATION IN THIS CHAPTER 

• Introduction 

• Cryptography 

• Data recovery 

• IRC 

• Bitcoin 


INTRODUCTION 


In previous chapters we have learned about various topics which are associated to col¬ 
lecting and making sense out of data. We learned about social media, search engines, 
metadata, dark web, and much more. In this last chapter we will cover some topics 
briefly which are not directly related to open source intelligence (OSINT) but to the 
computing and internet culture and its evolution. If you practice the information 
provided in previous chapters it is very likely to encounter these topics somewhere. 


CRYPTOGRAPHY 


There has always been a need to transfer messages from one location to another. Earlier 
people used to send messages through messengers who used to travel long distances to 
deliver them. Slowly a need to make this transmission secure came up. In situations like 
war, the message being intercepted by the enemy could have changed the whole situa¬ 
tion. To tackle such scenarios people started to invent techniques to conceal the original 
message, so that even if the message is intercepted it cannot be understood by anyone 
except the desired receiver. One of the simplest examples is Caesar cipher, in which each 
letter is replaced by another with a fixed alphabet position difference, so if the position 
difference is 4 (right), then A would become D, B would become E, and so on. In modem 
era, technology has advanced a lot and so has the techniques to encrypt as well as break it. 


BASIC TYPES 
Symmetric key 


In this type of cryptography both the parties (sender and receiver) use same key 
to encrypt and decrypt the message. A popular symmetric key algorithm is Data 
Encryption Standard (DES), there are also its modern variants such as Triple DES. 


Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-8018fi7-5.00015-X 
Copyright © 2015 Elsevier Inc. All rights reserved. 


267 


Copyrighted material 







268 CHAPTER 15 Related Topics of Interest 


Asymmetric key 

In this type, there are two keys, public and private. As the name suggests the public 
key is openly distributed but the private key remains secret. The public key is used 
to encrypt the message whereas only private key can decrypt it. This solved a major 
issue with symmetric key which was the need of multiple keys for communication 
with different parties. RSA is a good example of asymmetric key algorithm. 

Some other associated terms: 

Hashing 

In simpler terms hashing is converting a character string into a fixed size value. Usu¬ 
ally the hash is of small length. Some commonly used hashing algorithms are MD5, 
SHA1 etc. 

Encoding 

It is simply about converting a character into another form for the purpose of data 
transmission, storage etc. It is simply like translating a language into another so 
that the other party can understand it. Commonly used encodings are UTF-8, 
US-ASCII etc. 

The basic difference between these is that encrypted text requires a key to be con¬ 
verted back to plain text and it is mainly used for the confidentiality of message. In 
hashing, the hashed text cannot be reversed back to the original text and it is mainly 
used for integrity check and validation. The encoded text can be decoded back with 
any key. 

We came across different examples, cases, scenarios where we learned how data 
or information plays a vital role in this digital world. Similarly any digital data stored 
in devices such as computer, laptop, mobile device etc., are equally important. As 
these are the personal devices, it consists of more personal data so should be taken 
care of carefully. Any hardware issue, software malfunction, device crash or theft 
lead to either loosing of those important data or can be in wrong hands and the con¬ 
sequences are much worst. So storing any important data in digital form requires a 
meaningful effort to make that secure. There are many solutions available both open 
source as well as commercial to store the data securely in these devices. Choose 
any of those based on the level of confidentiality of data. Apart from storing the 
data securely and locally in any device there are other cloud solutions available to 
store our data in one place so that we can retrieve and use those as per our desire. 
Along with the data storage and data transmission it is also recommended to use 
secured backup from time to time to avoid any accidental loss of data. The solutions 
are tightly based on what we learned above and that is cryptography or encryption. 
Today we frequently use cryptography on daily basis through technologies such as 
SSL/TLS, PGP, digital signature, disk encryption etc. So here we can conclude that 
encryption plays a vital role in our day to day life to secure our digital or virtual life. 

With increase in computation power the ability to crack encrypted messages 
have also evolved. Attacks such as Brute-force, dictionary attack are easy to perform 
at a high speed. Also there are weaknesses in the algorithms, which make it easy 


Copyrighted material 


PA269 


Data recovery/shredding 


to perform cryptanalysis on them. Given enough time and computation power any 
encrypted text can be decrypted, so today the algorithms used attempt to make it 
so time consuming to that the decrypted text becomes worthless in the time used to 
crack it. 


DATA RECOVERY/SHREDDING 

Due to technological advancement now a days we prefer to store almost everything 
in digital form. A person who needs to send his/her documents does not want to 
visit a photocopy shop. He/she just wants to scan the hard copy for once and use the 
same soft copy number of time. This is just a simple example to understand human 
behavior now a days. So storing of important data in soft copy or in digital form 
arises some of the security risks. As we discussed above, the damage of the device, 
accidental delete can lead to loss of our important data. We just learned some precau¬ 
tions or in simple what to do with the digital data. But what if it got deleted? 

There are possible ways to recover it. For a naive user, data recovery is only pos¬ 
sible when the data is still present in trash or recycle bin, but it’s not so. The capabil¬ 
ity of data recovery is way beyond that. This is just because of the very nature of data 
storage and deletion function implemented by the operating system. To understand 
this we must understand the basic fundamental of data storage or how data getting 
stored in different storage devices. 

There arc different types of storage devices such as tape drives, magnetic stor¬ 
age devices, optical storage devices, and chips. Tape drives are not generally used 
for personal use, earlier it was an integral part of the enterprise storage system, now 
there is a possibility that it is being deprecated so let’s not talk about that. Apart from 
tape drive the other three are widely used. Magnetic devices are nothing but the hard 
disk devices we use, popularly known as HDD or hard disk drive, which stores all 
the data. When we delete data from our system the operating system does not delete 
the data from the magnetic disk but it just remove the address reference to that part, 
from the address table. Though the concept will be quite the same for all other media 
types such as DVD, but as we use these storage devices for backup and the HDDs 
for general storage, we will focus on HDD only. As we discussed that deleting a data 
from the system means removing its memory location details from the address table. 
So what is an address table and how it works? It’s quite simple. Generally when we 
store the data in device it takes some memory from the HDD. The starting memory 
location and the ending memory location define a data in hard disk. All these mem¬ 
ory location details are stored in a table called address table. So when we search for a 
particular data the system checks the address table to get the memory locations allo¬ 
cated for that. Once it gets that memory location it retrieves the data for us. As after 
deleting the data, the data is still present in the hard disk; we can recover it, unless it 
is overwritten by other data. Here deleting means deleting data from system as well 
as trash or recycle bin. So now we have some idea why data can be recovered after 
deletion but the major question still stands is how? Let’s take a look into that as well. 


269 


Copyrighted material 


PA271 


Internet Relay Chat 


^ File Shredder v2.5 


f' GT©® 


4T 


f] Exit 


File Shredder 

4> Add Fie(s) 

4> Add Folder 
* Remove Selected 
& Remove Al 


Disks *:) 

Shred Free Disk Space 


Options 

O' Shredder Settings 
L3 Updates 


Help 

^ Online Help 
fa Donate 
^ Related Software 
About 


Type 


Path 


£jj TM_Dialling_No.xls 


Mxrosoft Excel... D:\ 


Shred Files Now. 


Visit Fie Shredder Online www.fileshredder.org 


FIGURE 15.2 


Data shredding using FileShredder. 


INTERNET RELAY CHAT 

IRC or the Internet Relay Chat is like old school for many. It was developed by 
Jarkko Oikarinen in late 1980s. Though it was developed two decade back but the 
popularity of this application is still there. People still love to use IRC. The statistical 
data says it lost half of its users in last decade but as per a product of this old if still 
people arc willing to use it’s a great achievement. 

IRC is quite same as any other chat application. It follows client server archi¬ 
tecture and uses TCP protocol for communication. Earlier it uses plain text com¬ 
munication but now it also supports TLS or Transport Layer Security for encrypted 
communication. The major reason for its development was to use it as a group chat 
software and it serves the purpose quite well. As in general term we say different 
types of chat groups as chat room. In IRC terms it is called as channel. Unlike other 
chat clients it does not force a user to register but a user has to provide a nickname to 
start chatting. A user can chat in channel as well as directly with another user using 
private message option. IRC is widely used in different discussion forums and we 
love to use this whenever we get an opportunity to use. 

Normally to use IRC we need to install an IRC client in our system. There many 
clients available in internet and for all kinds of operating systems. So download a 
client which supports the operating system being used. Once we get an IRC client 
installed, connect to a channel to start communicating with fellow channel member. 


271 


Copyrighted material 

































272 CHAPTER 15 Related Topics of Interest 


The chat process is also quite same as the normal chat process. It’s basically line- 
based chat. One user will send a message in a line then the other will reply. Due to its 
anonymity most hackers prefer the use of IRC. The major question here is how it’s 
going to help us in OSINT. It’s quite simple as there are various channels available we 
can choose one based on our interest and crowdsource our question and get response 
from different experts. We need to be in the right place in right time to discuss what 
is happening in the cyber world. We can get clear scenario about what is happening 
all over the world and, for example, if we are lucky enough then we might get future 
prediction also such as which group is preparing for a distributed denial-of-service 
(DDOS) attack on a company, what are the possible targets, what is the current attack 
vectors hacktivists are using, and many more. The information we will get from here 
can be used to define cyberspace, trends in cyberspace and future prediction, discuss 
a query etc. So next time do not hesitate to use IRC, just provide a fancy name and 
enjoy chatting. A simple web-based IRC platform is http://webchat.freenode.net/. 
Simply enter a nickname and channel name, and start to explore. 



FIGURE 15.3 

Freenode IRC. 


BITCOIN 

Anyone into information security or keeping track of world media especially techni¬ 
cal journals must have heard of the term “bitcoin.” It was popular for its new concept 
earlier in technical field but later when the value of 1 bitcoin touched almost 1000$, it 


Copyrighted material 


















PA273 


Bitcoin 


started to trend between common internet users. Many must be aware of this but still 
we will discuss some of the important facts about bitcoin. Bitcoin can be referred as 
electronic currency or digital cash developed by Satoshi Nakamoto. Unlike normal 
currencies it uses a decentralized concept called peer-to-peer technology for transac¬ 
tions. It is based on an open source cryptographic protocol in a format of SHA-256 
hash in hexadecimal form. The smaller unit of a bitcoin is called as satoshis. 100 
million satoshis at a time creates one bitcoin. Bitcoin can also be referred as pay¬ 
ment system as there are no banks, organization, or individual has power to control 
or influence it. It’s always in digital form and can be transferred within a click to any 
individual over the world. There are pros as well as cons for this also. Some of the 
pros are we can convert bitcoin into any currencies independent of country. We can 
transit it anonymously, hence it is quite popular in darknet. No one can fake, create, 
or devalue bitcoins. Similarly there a large amount of cons also such as a transaction 
cannot be reversed. The security of bitcoin is low as it always there in digital form. 
Once a bitcoin wallet is deleted it is lost forever. 

Now we have a bit understanding on bitcoins. So it’s important to know how to 
store it also. We can store bitcoin digitally only, because it’s a digital data. We need a 
bitcoin wallet to store bitcoin. The major disadvantage of this is once accidentally we 
delete our wallet, we lose all the money. So take backups in proper intervals to avoid 
any such incident. The initial bitcoin project site is http://www.bitcoin.org/. 



FIGURE 15.4 

Bitcoin wallet. 


273 


Copyrighted material 






































PA275 


Index 


Note : Page numbers followed by “f * and “b” indicate figures and boxes respectively. 


A 

Academic sites, J_8 
Addictomatic, 22 
Addons 

buildwith. 48. 49f 
chat notification, A1 
Contact monkey, 52 
follow.net. 49 
onetab, 50-51 
Project Naptha, .31 
Reveye, 51 
Riffle, 49-50, 50f 
salesloft,31 
for security. 21 1 

HTTPS Everywhere, 212 
NoScript, 212 
WOT. 21 1.21 If 
shodan. 48 
Tineye, 31 
wappalvzer. 48 
whoworks.at. 50 
YouTubc,32 

Adobe Photoshop, 138-139 
Advanced search techniques. 25 
Faccbook, 25-26, 27f 
Linkedln, 27-30, 27f 
site operator. 31 
Twitter, 30, 3If 
Anonymity, 111 
Anonymous network, 164 
I2P, 165-168, 166f 
Onion Router, 164-165, 165f 
Antiviruses. 209 
Application-based proxy 
JonDo, 153-156, 154f-155f 
Ultrasurf, 152-153, 153f 
Application programming 
interface (API). 246 
Autocomplcte, 37-38 

B 

Betweenness, SNA 
bridges. 225 
defined. 222 
factors, 222, 222f 
gatekeeper/boundary spanners, 225 
isolate, 225-227, 225f-226f 


Liaison, 225 
network reach, 223-224 
nodes, role, 223 
star/hub, 224 
Bing 

features, 33 
operators 
/,_83 
“”,33 
0,-86 
&,36 
+, 85-87 
feed, 87, 87f 
filetype.36 
ip, 86, 87f 
site, 36 

Bitcoin, 272-274, 273f 

Bill ocker, 215, 2 15f 
Black Hat mashup, 253-254 
Boardreader, 23 
Bookmark, 32 
Browser, 32 
architecture, 34f 
browser engine, 33 
data persistence, 35-36 
error tolerance, 36 
javascript interpreter, 33 
networking, 33 
rendering engines, 33 
threads, 36 
UI backend, 33 
user interface, 33 
Chrome. 12 
Epic browser. 30 
features 

autocomplete. 37-38 
online and offline browsing, 36 
private browsing, 36-37, 37f 
proxy setup. 38 
Firefox, 12-13 
history of, 34 
operations, 33-34 
Buildwith, 48, 49f 
Business/company search, 32 
Glassdoor, 59-60, 60f 
Linkedln. 59 
Zoominfo,_60 


275 


Copyrighted material 





276 Index 


C 

Carrot2, 72-73, 73f 
CasePile. 194-196, 195f-196f 
ChcckUscrnames, 61 
Chromium, 32 
Clearweb, 169-170 
Contactmonkcy, 52 
Content sharing websites, 17-18 
Corporate websites, 17 
Creepy 

applying filter, 104, 104f 
geolocation, 102 

Plug-in Configuration button, 102, 102f 
results, 103, 103f-I04f 
search users, 102, 103f 
Cryptography, 267 
asymmetric key, 268 
encoding, 268-269 
hashing, 268 
symmetric key, 267 
Custom browsers. 46 
categories, 45-46 
Epic, 40 
FireCAT, 43^44 
HconSTF, 40-41 
Mantra, 41-43, 42f 
Oryon C,44.44f 
TOR bundle, 45 
Whitehat Aviator, 44-45 
CybcrGhost, 161-162, 162f 

D 

Darknet, II 
12P 

create own site, 180-183, 180f-183f 
download and install. 176 
forum, JJJL 179f 
git, ill. 177f 
home. 176. 177f 
ld3nt, 179, 179f 
paste, 12S. 178f 
Tor 

DuckDuckGo, 123* 173f 
files created, 175, 175f 
HiddenServicePort, 174-175 
Hidden Wiki, 1IZ 172f 
Silk Road, UA 
Torchan, 124. 174f 
Tor hidden service, 175f. 176 
Tor Wiki, 122. 173f 
XAMPP,_LZ4 
Darkweb, 170 


Data encryption, 215-216, 215f 
Data Encryption Standard (DES), 267 
Data leakage protection (DLP), 145-146 
Doc Scrubber, 146 
geotags, 146 
MAT, 115 

MetaShield Protector. 145 
M y DLP. 145 
OpenDLP, 146 

Data management/visualization 
and analysis tools 

CascFilc, 194-196, 195f-196f 
excel sheet, 190-191, 19If 
flowcharts, 192-193, 193f 
KccpNotc, 197-198, 198f 
Lumify, 198-199, 199f 
MagicTree, 196-197, 197f 
Maltego, 193-194, 194f 
SQL databases, 191-192, 192f 
Xmind, 199-201, 200f 
data, 188 
information, 188 
intelligence, 188-190 
DataMarket, 21 

Data recovery /shredding, 269-270, 270f-271f 
Deepweb. See also Darknet 
advantages, 171 
defined, 1Z0 
di sad vantages, 171 
Diggity Downloads, 143 
Doc Scrubber, 146 
Domain name system (DNS), 5-6, 8 
DuckDuckGo, 62* 62f 

E 

E-mail, _5 

Email-Rapportivc, 256 

EmailShcrlock. 60. 61 f 

Epic browser, 40 

Error tolerance, 26 

Excel sheet, 190-191, 19If 

Exif Search, 136-137, 136f-137f 

F 

Facebook, 22, 25-26, 27f 
Fingerprinting Organizations with 
Collected Archives (FOCA), 

139-140, 140f 
FireCAT, 42^44 
Follow.net, 49 
Freedom, 40^41 
Frcenct, 183-185, I83f-I84f 


Copyrighted material 


PA277 


Index 


G 

Gecko, A5 
Gephi 

Data Laboratory tab. 219. 219f 
installation, .218 
Overview tab, 218-219, 219f 
Preview tab. 220 
Glassdoor, 59-60, 60f 
Google 
operators 
-,-&l 
..,81 
M 
*,£1 
AND, j£l 
allintext, 
allinurl,22 
AROUND, 80-81 
cache, .82* 135f 
calculator, .83. 
convertor, 83-84 
define, _8Q 
ext,_8Q 
filetype,_80 
info. 82 
intext, 79-80 
intitle, _SQ 
inurl. 79 
NOT, _&1 
OR, M 

related, 82 
site, 78-84, 79f 
time, .82 
weather, 82 
search categories, 28 
Google+, 24-25 
Google Chrome, 38-39 
Google Hacking Database, 83-84, 84f 
Google Translate, 149 
Government sites, 12 

& 

Ilachoir-metadata, 138-139 

Hackerfox, 40^11 

Hacking attempts, 208 

Hard disk drive (HDD), 269 

IIavelBeenPwned, 214, 214f, 256 

Hello World program, 231, 23If 

HconSTF, 40-41 

Hideman, 163-164, 163f 

HTTPS Everywhere, 212 

HyperText Markup Language (HTML), 22 


i 

Id3nt. 179, 179f 
ImageRaider, 70-71 
Integrated database (IDBL 41 
Intelligence 

definition, 188-189 
managing data, 189 
structured data, 190 
Internet 
definition, 2 
history, 2 
working, 2 

Internet Relay Chat (IRC), 271-272, 272f 
Invisible Internet Project (I2P), 

165-168, 166f 

create own site, 180-183, 180f-183f 
download and install. 176 
forum, 128* 179f 
git, J22, 177f 
home, 12(2 177f 
Id3nt, 179, 1791- 
paste, J28* 178f 
IP address, 3-4 
iPhone, 137-138 

IRC. See Internet Relay Chat (IRC) 
ivMeta, 137-138, 138f 
Ixquick,^!, 55f 

J 

Jeffrey’s Exif Viewer, 134-136, 135f 
JonDo, 153-156, 154f-155f 
installation. 154 
interface. 154. 154f 
running, 155, 155f 
test, 154. 155f 
Windows users. 153 
JonDoFox,122 

K 

KeepNote, 197-198, 198f 
Key logger, 206 
Kngine. 63. 63f 
KnowEm,iil 

L 

Linkedln.^i. 27-30, 27f, 258 
LittleSis, 57-58 
Lumify, 198-199, 199f 

M 

MAC address, _5 

Magi cTree, 196-197, 197f 


277 


Copyrighted material 



278 Index 


Maltego 

Collaboration, 127-128, 128f 
commercial version. 124 
community version. 124 
domain to E-mail, 129-130, 130f 
domain to website IP. 128-129, 129f 
entity, _L24 
Investigate, 126 
machines, 125, 125f. 127 
Company Stalker, 263, 263f 
creating, 263-264, 264f 
HIBP local transform, 264 
MSL, 263 
output, 265, 265f 
Manage option, 126, I26f 
Organize option, 126, 126f 
person to website, 130-131, 13If 
transform. 124 

Maltego scripting language (MSL), 263 
Maltego Transforms, 245-251, 248f-250f 
Malwares 

Key logger, 206 
ransom wares, 206 
restricted sites, 205-206 
Trojan. 206 
virus, 206 

Mamma search engines, 55-56 
Mantra, 41—43, 42f 
Market Visual, ^8. 58f 
Media access control address, .5 
Metadata 

creation of, 133-134 
extraction tools 

Exif Search. 136-137, 136f-137f 
FOCA, 139-140, 140f 
hachoir-metadata, 138-139 
ivMeta, 137-138, 138f 
Jeffrey’s Exif Viewer, 134-136, 1351 
Mctagoofil, 140-142, 141f-142f 
impact, 142-143 
removal/DLP tools. 145 
Doc Scrubber, 146 
gcotags, 146 
MAT, 445 

MetaShield Protector, 145 
MyDLP,_L45 
OpenDLP, 146 
Search Diggity, 143-145 
Metadata anonymization toolkit (MAT), 145 
Metagoofil, 140-142, 141f-142f 
Meta search, .54 
Ixquick, .55* 55f 


Mamma, 55-56 
Polymetn, 54 : 54f 
MetaShield Protector, 145 
Microsoft Baseline Security Analyzer (MBSA), 
212, 213f 

Mozilla Firefox, 38-39 
MyDLP,445 

N 

Namechk. 61 
NerdyData, 66-67, 67f 
News sites. 17 
NoScript, 212 

0 

Ohloh code, hi 
Omgili, J3 
Onetab, 50-51 
.onion domain websites. 174 
Onion Router, 164-165, 165f 
Online anonymity 
IP address, 147-148 
proxy 

Google Translate, 442*. 150f 
page opened inside, 150, 151 f 
types of. 15 1 

whatismyipaddress, 149. 150f 
VPN, 161 

CyberGhost, 161-162, 162f 
Ilideman, 163-164, 163f 
Online scams/frauds, 207-208, 207f 
Online security 
addons. 211 

HTTPS Everywhere, 212 
NoScript, 212 
WOT,2_LL 21 If 
antiviruses. 209 

data encryption, 215-216, 215f 
hacking attempts, 208 
jail broken iPhone, 204-205 
malwares 

Key logger, 206 
ransom wares, 206 
restricted sites, 205-206 
Trojan, 206 
virus, 206 

operating system update, 210-211 
password policy, 213-214, 214f 
password reset, 204-205 
phishing, 207, 210 

scams and frauds, 207-208, 207f, 210 
shoulder surfing, 208-209 


Copyrighted material 


PA279 


Index 


social engineering. 209. 215 
spam message, 203-204 
tools, 212-213, 213f 
weak passwords, 208 
OpenDLP, 146 

Open source intelligence (OSINT) 
academic sites, ±8 
content sharing websites, 17-18 
corporate websites. 17 
demo, 255 
government sites, 33 
news sites. 17 
public sources, 16b 
search engines, 16-17 
tools and techniques, 101 
Creepy, 102-104 
Maltego, 124—131 
Recon-ng, 113-121 
Search Diggity, 110-113 
Shodan, 107-110 
TheHarvester, 105-107 
Yahoo Pipes, 121-124 
WEBINT,36 

weblogs/blogs, 18-19, 18f 
Operating system 
basic hardwares, _1H 
Linux, _LL 
Mac, 33 

Windows, 10-11 
Oryon C. 44. 44f 

OSINT. See Open source intelligence (OSINT) 

P 

PeekYou,32 
People search, .56 
LittlcSis, 57-58 
Market Visual, 38* 58f 
Peek You, .51 
Pipl, 56-57, 57f 
Spokco,36 
They Rule, 58-59 
Yasni,32 
Phishing, 207 
Pipl, 56-57, 57f 
Polymeta,33. 54f 
Ports, 4 
Pri me, 40-41 

Private browsing, 36-37, 37f 
Private IP address, 4 
Programming language 
Java, 11-12 
Python, _L2 


Project Naptha. 51 
Protocol, 4-5 
Proxy 

application-based proxy 
JonDo, 153-156, 154f-155f 
Ultrasurf, 152-153, 153f 
Google Translate, 332* 150f 
page opened inside. 150, 15If 
set up, 160-161, 160f 
in Chrome, 38 
in Firefox. 38 
web-based proxy, 156 

anonymouse.org, 156-158, 156f— 157f 
Boomproxy.com, 159-160 
FiltcrBypass, 159, I59f 
Zcnd2, 158-159, 158f 
whatismyipaddress,332* 150f 
Public IP address, 4 
Python. 230 

classes, 239-240 

common mistakes, 243-245, 244f 
data types, 232-235, 232f-234f 
functions, 239 

Hello World program, 231, 23If 
identifiers, 232 
indentation, 235-238 
installation. 230 
Maltego Transforms, 245-251, 
248f-250f 
modes, 230-231 
modules, 238-239 
programming vs. scripting, 229-230 
resource, 251-252 
user input, 242-243 
working with files 
os, 241 
re, 241 
sys, 241 
urllib2, 242 

R 

Ransom wares, 206 
Raw browsers, 38-40 
Recon-ng, 118f 

commands, 114b, 115 
installation. 1 14 
Linkedln, 1 19 

modules, 115, 115b, 116f-118f 
penetration testing, 320 
physical tracking, 119-120 
PunkSpider in progress. 120. 12If 
Rendering engines, 35 


279 


Copyrighted material 



280 Index 


Reverse image search. 69 
Google images, JiL 701* 

ImageRaider, 70-71 
TinEye, JO. 

Reverse username/e-mail search. 60 
CheckUsernames, 61 
EmailSherlock. 60. 61 f 
Facebook, 01 
KnowEm. 61 
Namechk,_61 
Reveye,01 
Riffle, 49-50, 50f 
Robots, JJ 
Robtex,_68 

S 

Salesloft, 51 
Search Diggity, 143-145 
basic requirement. 1 10 
interface. 110. 1 lOf 
NotlnMy Backyard, _L12, 112f 
scan-Bing tab. 111. 1 I2f 
scan-Google tab. 111. 11 If 
Shodan scan. 112. 113f 
Search engine optimization (SEO), J& 

Search engines, 53. See also specific search 
engines 

Secunia PSI, Jll 
Semantic search 

DuckDuckGo. 62. 62f 
Kngine, 03*63f 
Server, 1_ 

Shodan, 68-69, 69f 
banners, 107 
filters, 107, 108f 
popular searches, 107, 108f 
results for query ‘port: 21 country fin.”. 109. 109f 
results for query‘‘webcam,”, 108-109, 109f 
Shoulder surfing, 208-209 
Silk Road, 11A 
Small web format (SWF), M 
SNA. See Social network analysis (SNA) 

Social media intelligence (SOCMINT), Jll 
Social media search, 63 
SocialMenlion, 63-64, 64f 
Social Searcher, 64-65 
SocialMention, 63-64, 64f 
Social network analysis (SNA) 
edges, 2L8 

betweenness, 222-227, 222f 
directed edges, J2J_ 
ranking, 222 


type, J21 

undirected edges. 221 
weight, 221 
Gephi 

Data Laboratory tab. 219. 219f 
installation, 218 
Overview tab, 218-219, 219f 
Preview tab. 220 
network. 218 
nodes. 217. 220 
Social network websites, 21 f 
Facebook. 22 
features, 21 b 
Google+, 24-25 
Linkcdln. 23 
Twitter, 2A 

Social Searcher, 64-65 
SOCMINT. See Social media intelligence 
(SOCMINT) 

Source code search, 66-67 
NerdyData, 66-67, 67f 
Ohloh code. 67 
Spiders, _LZ 
Spokeo,J6 

SQL databases, 191-192, 192f 
Storage devices. 269 
Surface web,_LZ 

T 

Tape drives. 269 
Thellarvester 

in action, 105, 105f 
HTML results, Jili. 106f 
sources, 106-107 
TheyRule, 58-59 
Tincye, JJL JO 

Top level domains (TLDs), 5-6, 6b 
Topsy,_6i, 65f 
Tor, 164-165, 165f 

DuckDuckGo. 173. I73f 
files created, 175, 175f 
HiddenServicePort, 174-175 
Hidden Wiki,_LZZ 172f 
Silk Road. 174 
Torchan, _LZ±L 174f 
Tor hidden service, 175f. 176 
Tor Wiki, _LJZ 173f 
XAMPP,_LZ4 
TOR bundle, 4J 
Torchan. 174 7 1 741* 

Trendsmap, .66 
Trojan, 206 


Copyrighted material 




PA281 


Index 


Truecaller, 74-75 
Tweetbeep, .66 
Twitter, 2±L_3(L 31 f 
Topsy,_65, 65f 
Trendsmap, 66 
Tweetbeep, 66 
Twiangulate,66 


U 

Ultrasurf. 152-153, 153f 
Uniform resource locator (URL), 6-8 

V 

Virtualization, classifications, 7-8 
Virtual private network (VPN), 161 
CybcrGhost, 161-162, 162f 
Hideman, 163-164, 163f 
Virtual world. 19 
Virus, 206 

Vital Information Resources Under Seize, 206 

w 

Wappalyzcr, 

WayBack Machine, 69. 

W3dt,6S 

Weak passwords, 208 
WebJLfL 19-20 
Web 3.0. 32 
Web-based proxy, 156 

anonymouse.org, 156-158. 156f— 157f 
Boomproxy.com, 159-160 
FilterBypass, 159, 159f 
Zend2, 158-159, 158f 
Web browser, 7^ 

WEBINT,_L6 
WebKit, 35 

Weblogs/blogs, 18-19, 18f 
Web of trust (WOT).2JLL 21 If 
Web search engine, 1_ 

Whitchat Aviator, 44^15 
W hois, 68 
Whoworks.at,50 
Wise Data Recovery, 270, 270f 
Wolfram Alpha, 71-72, 72f 
World Wide Web (WWW) 
vs. internet, 3^ 
media types, _3 


X 

Xmind. 199-201, 200f 

Y 

Yahoo 

contents, 88 
operators 
88 

+, 88-90 
define, 89 
intitlc, 89-90, 90f 
link, 88-89, 89f 
OR, 88 
site, 88 

Yahoo Pipes, 121-124, 123f 

Yandex, .90 
defined. 90 
operators, 91 
/,M 
J, 92-93 
!!,.93 
93-94 
0,i23*93f 
*,_94 
&, 91 
&&, 91 
+,90-99 
~, 91 
«,M 
Cat, 98-99 
date, 96-97 
domain, 97 
host, .96 
inurl,.95 
lang, 97-98 

mimediletype. 95-96, 95f 

/number, 91-92 

rhost, .96 

site, .96 

title, 94-95 

url,.95 

Yasni,62 

Z 

Zend2, 158-159, 158f 

Zoominfo.iiO 


281 


Copyrighted material